Browser Tools

#108 0:15 Part of the Rysh video series

Navigate, click, type, screenshot -- the agent's browser toolbelt.

Browser Tools — 0:15 walkthrough

What you'll see

browser_action navigates to URLs, types into fields, and clicks elements.
get_text pulls page content; screenshot captures the view as a base64 image.
execute_js runs page JavaScript (approval required); tab tools: get_tabs, switch_tab, new_tab, close_tab.
Six selector strategies make element targeting robust on messy pages.

Commands shown

echo 'browser_action: navigate | type | click'
echo 'get_text -> page content   |   screenshot -> base64 image'
echo 'execute_js (approval) | tabs: get_tabs / switch_tab / new_tab / close_tab'
echo '6 selector strategies -> robust element targeting'

Keys used

Enter

Transcript

0:00Navigate, click, type, screenshot -- the agent's browser toolbelt.

0:05The core tool is browser_action -- it can navigate to a URL, type into a field, or click an element on the page.

0:15To read the page, the agent uses get_text to pull content, and screenshot to capture the view as a base64 image it can reason over.

0:26It can run arbitrary page JavaScript with execute_js -- which always requires your approval -- and manage tabs: get_tabs, switch_tab, new_tab, close_tab.

0:36And to find elements reliably, it has six selector strategies to choose from -- so it can target the right thing even on messy pages.

Key takeaway

Browser tools cover navigate/type/click, get_text, screenshot, approval-gated execute_js, tab management, and six selector strategies.