Agent Browser

Headless browser automation CLI for AI agents — fast Rust CLI with Node.js fallback, perfect for web scraping, testing, and browser interactions.

Overview

Agent Browser is a production-ready browser automation tool designed specifically for AI agents. It provides a clean CLI interface backed by Playwright, making it easy for AI systems to control browsers programmatically.

Why Use Agent Browser with Skills

Unlike manual testing tools, Agent Browser is optimized for AI agent workflows:

Deterministic refs: Snapshot-based element selection for reliable AI interaction
CLI-first design: Easy for agents to invoke commands
Fast native binary: Rust CLI with Node.js fallback for speed and compatibility
Multiple sessions: Isolated browser instances for parallel workflows
Semantic locators: AI-friendly element selection by role, label, text

Installation

Quick Install

npm install -g agent-browser
agent-browser install  # Download Chromium

From Source

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native   # Requires Rust
pnpm link --global
agent-browser install

Core Workflow

The optimal AI workflow uses snapshots and refs:

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]

# 2. Use refs to interact
agent-browser click @e2
agent-browser fill @e3 "test@example.com"

# 3. Re-snapshot after changes
agent-browser snapshot -i

Key Features

Get an accessibility tree with deterministic refs:

agent-browser snapshot -i -c -d 5

Options:

-i: Interactive elements only
-c: Compact mode (remove empty elements)
-d 5: Limit depth to 5 levels
-s "#main": Scope to selector

Semantic Locators

Find elements by semantic meaning:

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find text "Sign In" click

Multiple Sessions

Run isolated browser instances:

agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com

Each session has separate cookies, storage, and auth state.

Authenticated Sessions

Skip login flows with scoped headers:

agent-browser open api.example.com --headers '{
  "Authorization": "Bearer <token>"
}'

Headers are scoped to the origin, ensuring security.

Common Commands

agent-browser open <url>              # Navigate
agent-browser back                     # Go back
agent-browser forward                  # Go forward
agent-browser reload                   # Reload

Interaction

agent-browser click <sel>              # Click element
agent-browser fill <sel> <text>       # Clear and fill
agent-browser type <sel> <text>       # Type into element
agent-browser hover <sel>              # Hover
agent-browser scroll <dir> [px]       # Scroll up/down/left/right

Information

agent-browser get text <sel>          # Get text content
agent-browser get value <sel>         # Get input value
agent-browser get attr <sel> <attr>    # Get attribute
agent-browser get title               # Get page title

Wait Conditions

agent-browser wait <selector>          # Wait for element
agent-browser wait 5000               # Wait 5 seconds
agent-browser wait --text "Welcome"   # Wait for text
agent-browser wait --load networkidle # Wait for network idle

Integration with Skills

Skill Integration Pattern

A skill can invoke Agent Browser commands:

import { exec } from 'child_process';

async function scrapeForm(url: string) {
  await exec('agent-browser open ' + url);
  const { stdout } = await exec('agent-browser snapshot -i');
  // Parse snapshot, extract refs
  // Execute actions using refs
}

Best Practices

Always snapshot before acting: Get current state before making changes
Use refs for reliability: Avoid brittle selectors
Handle multiple sessions: Use sessions for parallel workflows
Implement retry logic: Browser operations can fail
Clean up resources: Always call agent-browser close when done

Use Cases in Skills

Web Testing Skills

Skills that need to test web applications can use Agent Browser for:

Form validation
UI regression testing
Accessibility testing
Cross-browser testing

Data Extraction Skills

Skills that scrape data can leverage:

Semantic element selection
Dynamic content handling
Authentication state management
Rate limiting via session isolation

Monitoring Skills

Skills that monitor websites can use:

Scheduled checks
Screenshot capture
Content change detection
Error page detection

Advanced Features

Streaming (Live Preview)

Enable live browser preview:

AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com

Connect via WebSocket for frame streaming and input injection.

CDP Mode

Connect to existing browser instances:

agent-browser --cdp 9222 snapshot

Useful for controlling Electron apps or Chrome with remote debugging.

Custom Executable

Use lightweight browser builds:

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open

Perfect for serverless deployments.

Limitations

Requires Chromium download (~684MB) or custom executable
Headless mode only (unless --headed flag used)
No built-in parallel execution (use multiple sessions)
Some Playwright features not exposed via CLI

Technical Details

Architecture: Rust CLI + Node.js daemon + Playwright
Platforms: macOS, Linux, Windows (native + fallback)
Browser Engine: Chromium (Playwright)
License: Apache-2.0

Any skill that needs browser interaction can integrate Agent Browser:

Web scraping skills: Extract data from websites
Testing skills: Automated UI/acceptance testing
Monitoring skills: Website health checks
Form automation skills: Data entry workflows

Example Skill Integration

See the official Agent Browser Skill for a complete example of how to integrate this tool into a Claude Code skill.

Agent Browser

Agent Browser

Overview

Why Use Agent Browser with Skills

Installation

Quick Install

From Source

Core Workflow

Key Features

Snapshot-Based Navigation

Semantic Locators

Multiple Sessions

Authenticated Sessions

Common Commands

Navigation

Interaction

Information

Wait Conditions

Integration with Skills

Skill Integration Pattern

Best Practices

Use Cases in Skills

Web Testing Skills

Data Extraction Skills

Monitoring Skills

Advanced Features

Streaming (Live Preview)

CDP Mode

Custom Executable

Limitations

Technical Details

Example Skill Integration

Related Tools

MarkItDown

Agent of Empires