8. `rysh-chrome-plugin` — Browser Control & Chat

rysh-chrome-plugin (~3.4K LOC of React/TypeScript in src/; ~5K including the dead root-level .js files — see §8.1) is a Chrome MV3 side-panel extension. It connects to rysh-server's NATS over a WebSocket and does two things: (1) lets the user chat with a server-side AI agent, and (2) lets that agent control the user's browser — navigate, click, type, scroll, screenshot, extract content, manage tabs.

8.1 What runs where

graph TB
    subgraph ext["Chrome extension"]
        SW["Background service worker
(background.js — legacy JS)
• sidePanel behavior
• GET_PAGE_CONTEXT injection
• auth check / sign-out
(NO NATS)"]
        Panel["Side-panel React app
(popup.html → src/main.tsx)
• NATS WebSocket
• chat UI + executor"]
        Inject["Injected MAIN-world scripts
(chrome.scripting, per-action)
• browser-executor
• selector-resolver"]
        AuthTab["auth.html / auth.js
(standalone onboarding)"]
    end
    Panel <-->|NATS over WS| Server["rysh-server
browser_pane_proxy"]
    Panel -->|chrome.scripting.executeScript| Inject
    SW -->|opens| AuthTab
    Panel -->|runtime.sendMessage GET_PAGE_CONTEXT| SW

Background service worker (background.js): sets sidePanel.setPanelBehavior({openPanelOnActionClick}), opens the auth tab when unauthenticated, and handles a chrome.runtime.onMessage bus (CHECK_AUTH, SIGN_OUT, OPEN_AUTH, GET_PAGE_CONTEXT). It does not touch NATS. GET_PAGE_CONTEXT injects a MAIN-world script capped at 30000 chars (falling back to tab metadata on protected pages):

func: () => {
  const MAX_BODY = 30000;
  return {
    title: document.title, url: location.href,
    selected: getSelection()?.toString() ?? '',
    description: document.querySelector('meta[name="description"]')?.content ?? '',
    body: (document.body?.innerText || '').substring(0, MAX_BODY).trim(),
  };
}

Side-panel React app: the live UI; hosts the persistent NATS WebSocket, the chat, and the BrowserActionExecutor.
No persistent content script: all page interaction is dynamic chrome.scripting.executeScript injection into the MAIN world. injectSelectorResolver installs window.__rysh_resolve_selector once per page.

Manifest (manifest_version: 3, name "Rysh AI", version 0.1.0): permissions: [storage, activeTab, scripting, tabs, sidePanel]; host_permissions: ["https://*.rysh.ai/*", "<all_urls>"]; side_panel.default_path: "popup.html"; background: {service_worker: "background.js", type: "module"}; web_accessible_resources: [auth.html]. There is no default_popup on the action — the panel opens via setPanelBehavior({openPanelOnActionClick: true}).

Legacy vs modern code

Files	Status
`background.js`, `authService.js`, `storage.js`, `auth.html`/`auth.js`/`auth.css`	ACTIVE — service worker + standalone auth tab (copied to `dist/` by Vite)
`src/` React/TS app	ACTIVE — the live UI
`api.js`, `nats-client.js`, `popup.js`	DEAD LEGACY — not copied, not referenced; superseded by `src/services/*`. They even use an old subject namespace (`agentic` vs the current `llm_prompt_execution`).

8.2 The NATS client (`src/services/nats-client.ts`)

A thin WebSocket wrapper speaking the Rysh envelope protocol (NOT raw NATS — the server's browser_pane_proxy bridges WS↔NATS).

Wire frame: JSON { subject, data: NATSEnvelope } where the TS envelope uses short keys and a base64-JSON payload:

interface NATSEnvelope { t: string; r: string; p: string; } // p = base64(JSON(payload))

static encode(typeTag: string, payload: Record<string, unknown>): NATSEnvelope {
  const bytes = new TextEncoder().encode(JSON.stringify(payload)); // UTF-8 bytes
  let binary = ''; bytes.forEach(b => { binary += String.fromCharCode(b); });
  return { t: typeTag, r: '', p: btoa(binary) };                   // UTF-8-safe base64
}

The UTF-8 TextEncoder→btoa dance avoids range errors on Unicode page text (a plain btoa(JSON) would throw). Auto-reconnect uses _baseReconnectDelay = 1000ms doubling to a 30000ms cap, up to _maxReconnectAttempts = 20; handlers (keyed by exact subject) survive reconnect. publish silently drops (warns) if the socket isn't OPEN.

Bootstrap (api.ts): POST {serverURL}/api/browser-panes with Authorization: Bearer {token} → { pane_id, ws_url }; the WS URL swaps https→wss and appends ?token=… (token auth via query param).

Subjects (all prefixed `rysh.pane.{paneID}.`)

Publishes (extension → server)	Tag
`.llm_prompt_execution.inbox`	`MsgAgenticPrompt`
`.llm_prompt_execution.inbox`	`MsgAgenticCancel`
`.approval.response`	`MsgApprovalResponse`
`.browser.response`	`MsgBrowserActionResponse`

Subscribes (server → extension)	Purpose
`.llm_prompt_execution.output`	streaming output `{type, content, metadata}`
`.llm_prompt_execution.status`	`{phase, iteration, max_iterations}`
`.approval.request`	`{request_id, type, description, diff, choices}`
`.output.{shell,rysh,chat}`	per-mode terminal text
`.share.command.inbound`	a remote CLI sent a command
`.browser.request`	the agent's browser-control requests

These map field-for-field to rysh-shared/msg (MsgBrowserActionRequest/Response, etc.).

8.3 Browser action execution (`src/services/browser-executor.ts`)

BrowserActionExecutor.execute(req) dispatches via a handler map. Element-targeting actions first inject the selector resolver (unless on a protected chrome:///about: page). Active-tab resolution queries lastFocusedWindow first (the side panel's own window is not the page window).

sequenceDiagram
    participant Agent as Server agent (browser_action tool)
    participant Ext as Extension (executor)
    participant Page as Target tab (MAIN world)
    Agent->>Ext: .browser.request {request_id, action, params}
    Ext->>Ext: injectSelectorResolver (if element action)
    Ext->>Page: chrome.scripting.executeScript (action fn)
    Page-->>Ext: result / error
    Ext->>Agent: .browser.response {request_id, success, result|error|screenshot}

The 23 actions:

Group	Actions
Navigation	`navigate`, `back`, `forward`, `reload`
Element interaction	`click`, `type`, `select`, `check`, `hover`, `press_key`, `drag_drop`
Scroll / wait	`scroll` (default 500px, smooth), `wait` (readyState or MutationObserver, default 10s)
Content extraction	`get_text`, `get_html` (each capped at 50000 chars), `get_elements` (default limit 50), `get_value`
Tabs	`get_tabs`, `switch_tab` (by `tab_id` \| `index` \| `url_pattern`, focuses the window), `new_tab`, `close_tab`
Screenshot	`screenshot` — `captureVisibleTab` as JPEG q40 (deliberately, to stay under NATS's ~1MB limit)
JS execution	`execute_js` — `eval` in MAIN world (approval gating is server-side, not enforced in the extension)

The type action uses the React-compatible native value setter so controlled inputs update (clear defaults to true):

const nativeSetter = Object.getOwnPropertyDescriptor(window.HTMLInputElement.prototype, 'value')?.set
                  || Object.getOwnPropertyDescriptor(window.HTMLTextAreaElement.prototype, 'value')?.set;
nativeSetter ? nativeSetter.call(el, newValue) : (el.value = newValue);
el.dispatchEvent(new Event('input',  { bubbles: true }));
el.dispatchEvent(new Event('change', { bubbles: true }));

screenshot and execute_js cores:

// screenshot — JPEG q40 keeps the base64 payload under NATS's 1MB max (a PNG would be 2-3MB)
const dataUrl = await chrome.tabs.captureVisibleTab(tab.windowId!, { format: 'jpeg', quality: 40 });
return { screenshot: dataUrl.replace(/^data:image\/jpeg;base64,/, ''), ... };

// execute_js — raw eval in the page's MAIN world (a notable security surface)
func: (code: string) => { try { const r = eval(code); return { result: r }; } catch (e:any) { return { error: e.message }; } }

waitForTabLoad listens on chrome.tabs.onUpdated for complete (+300ms settle, 15s cap). BrowserActionResult = { request_id, success, result?, error?, screenshot? }.

Selector resolver (`src/services/selector-resolver.ts`)

Injected once per page; resolves (selector, text?, index) by strategy prefix:

Prefix	Strategy
`xpath:` or `//`	`document.evaluate`
`text:`	direct text-node substring match (prefers shortest/most specific)
`aria:`	`[aria-label]` / `[aria-labelledby]`
`role:`	`[role=…]`
`testid:`	`[data-testid=…]`
(default)	CSS `querySelectorAll`

Then optionally filters by text, prefers visible elements, and returns pool[index].

8.4 Auth flow

API-key based against rysh-server (not Firebase, despite mirroring its API shape).

graph LR
    Setup["SetupScreen: Authorize API Key"] -->|OPEN_AUTH| BG["background opens auth.html"]
    BG --> Form["auth.js: server URL + API key"]
    Form -->|"probe {url}/health (5s)"| OK{reachable?}
    OK -->|yes| Store["chrome.storage.local:
auth_token, auth_user,
auth_time, server_url"]
    Store --> App["App.tsx: onAuthStateChanged →
loadServerURL → ensurePane → connected"]

The key is "verified" only by probing /health (reachability, not key validation). All requests send Authorization: Bearer {auth_token}; the WS URL carries ?token=…. Sign-out clears storage and deletes the pane.

Workspace badge (fetchServerInfo, api.ts): after auth, the extension calls GET {serverURL}/api/server-info (Bearer token) to resolve the NATS workspace name, caches it in chrome.storage.local under server_workspace, and renders it as a Header badge — used to tell CLI users which [upstream] workspace= / RYSH_UPSTREAM_WORKSPACE to set.

Settings panel (Header.tsx): a Settings modal (openSettings/saveSettings) lets the user change the server_url at runtime and re-fetch the workspace, independent of the initial auth.js onboarding.

8.5 State & UI

State (src/store.ts, Zustand): input mode (default 'prompt' — "AI mode (extension use-case)", cycleInputMode rotates MODE_CYCLE via modulo), chat messages, streaming accumulator (finalizeStreaming commits the buffer as one assistant Message with crypto.randomUUID()), terminal output buffers, connection (connected, paneID), loading/status, pendingApproval, per-mode input history (newest-first, .slice(0, 200), historyIdx default -1), share state, and browserAction (drives the indicator). escCount/escTimer hold double-ESC state (the 400ms window lives in useKeyboard.ts).

Types (src/types.ts):

export type InputMode = 'shell' | 'prompt' | 'rysh' | 'chat';
export const MODE_PROMPT: Record<InputMode,string>  = { shell:'>', prompt:'<', rysh:'##', chat:'@' };
export const MODE_LABELS: Record<InputMode,string>  = { shell:'SHELL', prompt:'AI', rysh:'RYSH', chat:'CHAT' };
export const MODE_PLACEHOLDER: Record<InputMode,string> = { /* per-mode input placeholder text */ };
export const MODE_CYCLE: InputMode[] = ['shell','prompt','rysh','chat'];

interface Message       { id; role:'user'|'assistant'|'tool'; content; timestamp:Date; mode:InputMode; streaming?; sender? }
interface PendingApproval { requestID; type; description; diff:DiffPayload|null; choices:{label,description}[] }
interface OutputEvent   { type: 'text'|'tool_call'|'tool_result'|'diff'|'error'|'shell'|'rysh'|'chat'|'user_prompt'|'browser_action'; content; metadata? }

Components:

Component	Role
`App.tsx`	resolves auth → `ChatScreen` or `SetupScreen`
`ChatScreen.tsx`	the central event router (wires `apiService.onOutput/onStatus/onApproval` into the store)
`Header.tsx`	logo, status, mode indicator, workspace badge, connection dot; Share / New-chat / Menu (use page context, settings, sign out)
`PaneInput.tsx`	mode-aware input; captures page context in prompt mode; send→cancel while loading
`PaneOutput.tsx`	terminal output (ANSI→HTML) for shell/rysh; message bubbles + streaming for prompt/chat; scroll-lock
`MessageBubble.tsx`	user/assistant/tool bubbles; markdown rendering; `↗ remote: {sender}` for shared prompts
`ApprovalDialog.tsx`	modal for `.approval.request`; diff + choices; Yes / Yes-Always / No / No+reason
`BrowserActionIndicator.tsx`	pulsing "Browser: {action}" banner
`ModeIndicator`, `ErrorBanner`, `LoadingIndicator`, `DebugOverlay`	status/UX helpers (DebugOverlay defaults open — a removable aid)

useKeyboard.ts: double-ESC within 400ms cycles the input mode. utils/ansi.ts: ANSI→HTML + a mini markdown renderer.

Client-side ## commands (submitInput in api.ts): shell/rysh-mode input starting with ## is handled locally rather than sent to the agent — ##share pane|panegroup|tab, ##share list, ##share status, ##unshare (a stub: "not yet implemented"), and ##help. Non-## shell input returns "No shell available in browser panes" (there is no PTY in a browser pane).

8.6 Key end-to-end flows

AI prompt with page context: PaneInput → GET_PAGE_CONTEXT (background injects into active tab) → POST /api/browser-panes/{id}/context + prepend <browser_page>…</browser_page> → MsgAgenticPrompt → server agent runs → streams back on .output (+ .status) → streaming bubble.

Agent-controls-browser loop (headline feature): server LLM calls browser_action → .browser.request → executor runs chrome.*/MAIN-world script → .browser.response → tool result returns to the LLM → loop continues.

Pane share (remote CLI drives the browser): Header Share → POST /api/browser-panes/{id}/share → {share_id, subscribe_cmd} → CLI runs ##upstream subscribe {id} → command arrives on .share.command.inbound → extension captures page context and republishes as MsgAgenticPrompt.

8.7 Caveats

api.js, nats-client.js, popup.js are dead code (the React TS port supersedes them and uses a different subject namespace).
The WS token is passed as a URL query param.
Screenshots are forced to JPEG q40 to stay under NATS's ~1MB limit.
execute_js runs raw eval in the page MAIN world; approval gating is server-side, not enforced in the extension.
Auth "verifies" the key only by probing /health.
DebugOverlay defaults to open and is explicitly flagged as a removable debugging aid.

8. rysh-chrome-plugin — Browser Control & Chat