Computers controlling web browsers
Amazon launched Nova Act yesterday, which is their “control a web browser with an LLM” solution. This is comparable to GPT Operator, Claude Computer Use and Open Interpreter (the last is my favorite, and open-source)
I don’t get too excited about these anymore. They make for really fun one-off demos, but for repeated tasks they’re slow, unreliable, fall out-of-date, and get blocked by captchas and other anti-bot tech.
I’ve worked with tons of web browser automation tools via LLMs, via code, for testing my apps, for scraping other websites, you name it. Playwright, Puppeteer, Rabbit r1 (lol), plus hosted solutions like BrowserBase, Checkly, and Anon (the last is my favorite)
My money says websites that want to be accessed by agents/LLMs will expose APIs or add accessibility flags that make using a full-blown web browser unnecessary. Web browsers are kind of bloated and meant for humans. APIs are good clean machine interfaces for machines.
Websites that don’t want to be controlled by agents have already implemented aggressive anti-bot tech and don’t have an API for a reason. Ticketmaster etc. Every website backed by Cloudflare will eventually serve you a captcha.
Hosted browser services use all kinds of tricks to get around this: fingerprint spoofing, residential proxies, captcha farming, faux clicks, etc. These the same tactics employed by spammers. This is a tough problem.
So I think these headed- and headless browser solutions will work well as a sort of stop-gap for sites without APIs that don’t mind bots visiting, but I’m not expecting much more than that. It’s just way faster to control Spotify using API calls than via any of their user interfaces. I’m still gonna follow along with computer-use developments, but I rate it “idk, depends on what you’re doing” for now
This post also on: