Browser Tools
Navigate, click, type, screenshot -- the agent's browser toolbelt.
What you'll see
browser_actionnavigates to URLs, types into fields, and clicks elements.get_textpulls page content;screenshotcaptures the view as a base64 image.execute_jsruns page JavaScript (approval required); tab tools: get_tabs, switch_tab, new_tab, close_tab.- Six selector strategies make element targeting robust on messy pages.
Commands shown
echo 'browser_action: navigate | type | click'
echo 'get_text -> page content | screenshot -> base64 image'
echo 'execute_js (approval) | tabs: get_tabs / switch_tab / new_tab / close_tab'
echo '6 selector strategies -> robust element targeting'
Keys used
Enter
Transcript
0:00Navigate, click, type, screenshot -- the agent's browser toolbelt.
0:05The core tool is browser_action -- it can navigate to a URL, type into a field, or click an element on the page.
0:15To read the page, the agent uses get_text to pull content, and screenshot to capture the view as a base64 image it can reason over.
0:26It can run arbitrary page JavaScript with execute_js -- which always requires your approval -- and manage tabs: get_tabs, switch_tab, new_tab, close_tab.
0:36And to find elements reliably, it has six selector strategies to choose from -- so it can target the right thing even on messy pages.
Key takeaway
Browser tools cover navigate/type/click, get_text, screenshot, approval-gated execute_js, tab management, and six selector strategies.