Agent Browser
Headless browser automation CLI for AI agents β fast Rust CLI with Node.js fallback, perfect for web scraping, testing, and browser interactions.
Overview
Agent Browser is a production-ready browser automation tool designed specifically for AI agents. It provides a clean CLI interface backed by Playwright, making it easy for AI systems to control browsers programmatically.
Why Use Agent Browser with Skills
Unlike manual testing tools, Agent Browser is optimized for AI agent workflows:
- Deterministic refs: Snapshot-based element selection for reliable AI interaction
- CLI-first design: Easy for agents to invoke commands
- Fast native binary: Rust CLI with Node.js fallback for speed and compatibility
- Multiple sessions: Isolated browser instances for parallel workflows
- Semantic locators: AI-friendly element selection by role, label, text
Installation
Quick Install
npm install -g agent-browser
agent-browser install # Download Chromium
From Source
git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
pnpm install
pnpm build
pnpm build:native # Requires Rust
pnpm link --global
agent-browser install
Core Workflow
The optimal AI workflow uses snapshots and refs:
# 1. Navigate and get snapshot
agent-browser open example.com
agent-browser snapshot -i
# Output:
# - heading "Example Domain" [ref=e1] [level=1]
# - button "Submit" [ref=e2]
# - textbox "Email" [ref=e3]
# 2. Use refs to interact
agent-browser click @e2
agent-browser fill @e3 "test@example.com"
# 3. Re-snapshot after changes
agent-browser snapshot -i
Key Features
Snapshot-Based Navigation
Get an accessibility tree with deterministic refs:
agent-browser snapshot -i -c -d 5
Options:
-i: Interactive elements only-c: Compact mode (remove empty elements)-d 5: Limit depth to 5 levels-s "#main": Scope to selector
Semantic Locators
Find elements by semantic meaning:
agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"
agent-browser find text "Sign In" click
Multiple Sessions
Run isolated browser instances:
agent-browser --session agent1 open site-a.com
agent-browser --session agent2 open site-b.com
Each session has separate cookies, storage, and auth state.
Authenticated Sessions
Skip login flows with scoped headers:
agent-browser open api.example.com --headers '{
"Authorization": "Bearer <token>"
}'
Headers are scoped to the origin, ensuring security.
Common Commands
Navigation
agent-browser open <url> # Navigate
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload
Interaction
agent-browser click <sel> # Click element
agent-browser fill <sel> <text> # Clear and fill
agent-browser type <sel> <text> # Type into element
agent-browser hover <sel> # Hover
agent-browser scroll <dir> [px] # Scroll up/down/left/right
Information
agent-browser get text <sel> # Get text content
agent-browser get value <sel> # Get input value
agent-browser get attr <sel> <attr> # Get attribute
agent-browser get title # Get page title
Wait Conditions
agent-browser wait <selector> # Wait for element
agent-browser wait 5000 # Wait 5 seconds
agent-browser wait --text "Welcome" # Wait for text
agent-browser wait --load networkidle # Wait for network idle
Integration with Skills
Skill Integration Pattern
A skill can invoke Agent Browser commands:
import { exec } from 'child_process';
async function scrapeForm(url: string) {
await exec('agent-browser open ' + url);
const { stdout } = await exec('agent-browser snapshot -i');
// Parse snapshot, extract refs
// Execute actions using refs
}
Best Practices
- Always snapshot before acting: Get current state before making changes
- Use refs for reliability: Avoid brittle selectors
- Handle multiple sessions: Use sessions for parallel workflows
- Implement retry logic: Browser operations can fail
- Clean up resources: Always call
agent-browser closewhen done
Use Cases in Skills
Web Testing Skills
Skills that need to test web applications can use Agent Browser for:
- Form validation
- UI regression testing
- Accessibility testing
- Cross-browser testing
Data Extraction Skills
Skills that scrape data can leverage:
- Semantic element selection
- Dynamic content handling
- Authentication state management
- Rate limiting via session isolation
Monitoring Skills
Skills that monitor websites can use:
- Scheduled checks
- Screenshot capture
- Content change detection
- Error page detection
Advanced Features
Streaming (Live Preview)
Enable live browser preview:
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
Connect via WebSocket for frame streaming and input injection.
CDP Mode
Connect to existing browser instances:
agent-browser --cdp 9222 snapshot
Useful for controlling Electron apps or Chrome with remote debugging.
Custom Executable
Use lightweight browser builds:
AGENT_BROWSER_EXECUTABLE_PATH=/path/to/chromium agent-browser open
Perfect for serverless deployments.
Limitations
- Requires Chromium download (~684MB) or custom executable
- Headless mode only (unless
--headedflag used) - No built-in parallel execution (use multiple sessions)
- Some Playwright features not exposed via CLI
Technical Details
- Architecture: Rust CLI + Node.js daemon + Playwright
- Platforms: macOS, Linux, Windows (native + fallback)
- Browser Engine: Chromium (Playwright)
- License: Apache-2.0
Related Skills
Any skill that needs browser interaction can integrate Agent Browser:
- Web scraping skills: Extract data from websites
- Testing skills: Automated UI/acceptance testing
- Monitoring skills: Website health checks
- Form automation skills: Data entry workflows
Example Skill Integration
See the official Agent Browser Skill for a complete example of how to integrate this tool into a Claude Code skill.
