Skip to the content.

🇻🇳 Tiếng Việt

← Home · Installation · Architecture · First Agent · Google · Browser · Cron · Multi-Agent · Profiles · Models · Memory · Ops

Browser Automation

Use Playwright to automate websites that don’t have APIs — Goodreads, Facebook, etc.

Setup

# Create shared venv for all scripts
python3 -m venv ~/.openclaw/common-scripts/.venv
source ~/.openclaw/common-scripts/.venv/bin/activate
pip install playwright playwright-stealth
playwright install chromium

Anti-bot Protection

Websites detect headless browsers. Essential countermeasures:

from playwright.async_api import async_playwright

context = await playwright.chromium.launch_persistent_context(
    user_data_dir="~/.openclaw/common-scripts/<service>/.browser-data",
    headless=True,
    viewport={"width": 1280, "height": 800},
    user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
               "AppleWebKit/537.36 (KHTML, like Gecko) "
               "Chrome/131.0.0.0 Safari/537.36",
    args=["--disable-blink-features=AutomationControlled", "--no-sandbox"],
    ignore_default_args=["--enable-automation"],
)

# Apply stealth plugin
from playwright_stealth import stealth_async
await stealth_async(page)

Key techniques:

Pattern: Login Once, Automate Forever

# First time: headless=False, user logs in manually
# After that: headless=True, cookies persist

async def cmd_login():
    context = await create_browser_context(headless=False)  # Shows browser
    page = context.pages[0]
    await page.goto("https://service.com/login")
    input("Press Enter after logging in...")  # Wait for user
    await context.close()  # Cookies saved!

async def cmd_action():
    context = await create_browser_context(headless=True)   # No browser window
    # Cookies loaded automatically from persistent context
    page = context.pages[0]
    await page.goto("https://service.com/protected-page")
    # ... do stuff

Pattern: JSON Output for Agent Integration

All automation scripts should output JSON:

import json, sys

def result_json(success, action, **kwargs):
    data = {"success": success, "action": action, **kwargs}
    print(json.dumps(data, ensure_ascii=False, indent=2))
    sys.exit(0 if success else 1)

# Usage
result_json(True, "rate", book_id="123", stars=5,
            message="Rated 5 stars for 'Project Hail Mary'")

Agent reads JSON output to confirm success and report to user.

Pattern: RSS Verification

After browser actions (shelf change, rating), verify via RSS/API:

def verify_shelf_rss(user_id, book_id, expected_shelf):
    import urllib.request, xml.etree.ElementTree as ET
    url = f"https://www.goodreads.com/review/list_rss/{user_id}?shelf={expected_shelf}"
    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
    with urllib.request.urlopen(req, timeout=10) as r:
        root = ET.fromstring(r.read().decode("utf-8"))
    for item in root.findall(".//item"):
        if item.findtext("book_id") == str(book_id):
            return True
    return False

Gotchas

DOM Selectors Change

Websites update their HTML frequently. Defensive strategies:

Screenshots Must Be in Workspace

OpenClaw sandbox restricts file access. Screenshots must be saved to agent workspace:

--shots-dir ~/.openclaw/workspace-personal/temp-screenshots

Rate Limiting

await page.wait_for_timeout(2000)  # 2s between actions

Don’t hammer — Goodreads/Facebook will block you.

Goodreads-Specific: Edit Page Redirect

Goodreads redirects /review/edit/{id}/review/new/{id} for books without prior reviews. Accept both URL patterns:

is_review_page = "/review/edit" in url or "/review/new" in url

Next: Cron Jobs & Scheduling →