ToolπŸ“… Updated 2026-01-18

Agent Browser

Headless browser automation CLI for AI agents β€” fast Rust CLI with Node.js fallback, perfect for web scraping, testing, and browser interactions.

web-automationtestingscrapingmonitoringbrowserautomationcli
7,200
Stars
vercel-labs
Author

Agent Browser

Headless browser automation CLI for AI agents β€” fast Rust CLI with Node.js fallback, perfect for web scraping, testing, and browser interactions.

Overview

Agent Browser is a production-ready browser automation tool designed specifically for AI agents. It provides a clean CLI interface backed by Playwright, making it easy for AI systems to control browsers programmatically.

Why Use Agent Browser with Skills

Unlike manual testing tools, Agent Browser is optimized for AI agent workflows:

  • Deterministic refs: Snapshot-based element selection for reliable AI interaction
  • CLI-first design: Easy for agents to invoke commands
  • Fast native binary: Rust CLI with Node.js fallback for speed and compatibility
  • Multiple sessions: Isolated browser instances for parallel workflows
  • Semantic locators: AI-friendly element selection by role, label, text

Installation

Quick Install

npm install -g agent-browser
agent-browser install  # Download Chromium

From Source

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native   # Requires Rust
pnpm link --global
agent-browser install

Core Workflow

The optimal AI workflow uses snapshots and refs:

# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]

# 2. Use refs to interact
agent-browser click @e2
agent-browser fill @e3 "test@example.com"

# 3. Re-snapshot after changes
agent-browser snapshot -i

Key Features

Snapshot-Based Navigation

Get an accessibility tree with deterministic refs:

agent-browser snapshot -i -c -d 5

Options:

  • -i: Interactive elements only
  • -c: Compact mode (remove empty elements)
  • -d 5: Limit depth to 5 levels
  • -s "#main": Scope to selector

Semantic Locators

Find elements by semantic meaning:

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find text "Sign In" click

Multiple Sessions

Run isolated browser instances:

agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com

Each session has separate cookies, storage, and auth state.

Authenticated Sessions

Skip login flows with scoped headers:

agent-browser open api.example.com --headers '{
  "Authorization": "Bearer <token>"
}'

Headers are scoped to the origin, ensuring security.

Common Commands

agent-browser open <url>              # Navigate
agent-browser back                     # Go back
agent-browser forward                  # Go forward
agent-browser reload                   # Reload

Interaction

agent-browser click <sel>              # Click element
agent-browser fill <sel> <text>       # Clear and fill
agent-browser type <sel> <text>       # Type into element
agent-browser hover <sel>              # Hover
agent-browser scroll <dir> [px]       # Scroll up/down/left/right

Information

agent-browser get text <sel>          # Get text content
agent-browser get value <sel>         # Get input value
agent-browser get attr <sel> <attr>    # Get attribute
agent-browser get title               # Get page title

Wait Conditions

agent-browser wait <selector>          # Wait for element
agent-browser wait 5000               # Wait 5 seconds
agent-browser wait --text "Welcome"   # Wait for text
agent-browser wait --load networkidle # Wait for network idle

Integration with Skills

Skill Integration Pattern

A skill can invoke Agent Browser commands:

import { exec } from 'child_process';

async function scrapeForm(url: string) {
  await exec('agent-browser open ' + url);
  const { stdout } = await exec('agent-browser snapshot -i');
  // Parse snapshot, extract refs
  // Execute actions using refs
}

Best Practices

  1. Always snapshot before acting: Get current state before making changes
  2. Use refs for reliability: Avoid brittle selectors
  3. Handle multiple sessions: Use sessions for parallel workflows
  4. Implement retry logic: Browser operations can fail
  5. Clean up resources: Always call agent-browser close when done

Use Cases in Skills

Web Testing Skills

Skills that need to test web applications can use Agent Browser for:

  • Form validation
  • UI regression testing
  • Accessibility testing
  • Cross-browser testing

Data Extraction Skills

Skills that scrape data can leverage:

  • Semantic element selection
  • Dynamic content handling
  • Authentication state management
  • Rate limiting via session isolation

Monitoring Skills

Skills that monitor websites can use:

  • Scheduled checks
  • Screenshot capture
  • Content change detection
  • Error page detection

Advanced Features

Streaming (Live Preview)

Enable live browser preview:

AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com

Connect via WebSocket for frame streaming and input injection.

CDP Mode

Connect to existing browser instances:

agent-browser --cdp 9222 snapshot

Useful for controlling Electron apps or Chrome with remote debugging.

Custom Executable

Use lightweight browser builds:

AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open

Perfect for serverless deployments.

Limitations

  • Requires Chromium download (~684MB) or custom executable
  • Headless mode only (unless --headed flag used)
  • No built-in parallel execution (use multiple sessions)
  • Some Playwright features not exposed via CLI

Technical Details

  • Architecture: Rust CLI + Node.js daemon + Playwright
  • Platforms: macOS, Linux, Windows (native + fallback)
  • Browser Engine: Chromium (Playwright)
  • License: Apache-2.0

Any skill that needs browser interaction can integrate Agent Browser:

  • Web scraping skills: Extract data from websites
  • Testing skills: Automated UI/acceptance testing
  • Monitoring skills: Website health checks
  • Form automation skills: Data entry workflows

Example Skill Integration

See the official Agent Browser Skill for a complete example of how to integrate this tool into a Claude Code skill.