My Journey Building a sales tracker agent

4 Aug

Scraping websites might sound like a straightforward programming challenge: send requests, get data, parse HTML. But anyone who’s ventured into the world of scraping knows it’s more of a cat-and-mouse game. And if you’re trying to scrape Amazon? It’s on another level. This is the story of how I went from abandoned ideas to a fully working solution - a journey marked by dead ends, unexpected insights, and relentless problem-solving.

The Spark That Never Left

This idea first came to me when I just started programming. I was fascinated by the practical side of web scraping, and Amazon was a natural first target. Fetch the price of a product everyday, and send a notification if something was on sale. But I had no idea how aggressive their anti-bot defenses were. They chewed me up and spat me out. I gave up. But the idea never really left me.

A New Angle with Relevance AI

A few years later, I started wondering again: what if I gave it another go? This time powered by agents built with Relevance AI? The idea was to scrape sites, then hand off the raw data to an agent that could intelligently extract and process it, the agentic way. I began with Firecrawl, an easy entry point for scraping. But the native integration available on Relevance had limitations, especially when it came to control and customization.

Python First… but JavaScript Wins for Scraping

I initially reached for Python which had better support and more flexibility on Relevance AI. But it didn’t take long to hit a wall. Python works great for static pages, but struggles with dynamic content rendered through JavaScript. So I switched to JavaScript, which gave me access to the full DOM, including all the dynamic elements that were invisible to Python-based tools. This shift made a huge difference and I could finally access the content I needed.

The Big Realization: Stop Hiding, Start Identifying

As I pushed forward, the next challenge was Amazon’s bot detection. My first instinct was to hide my tracks through rotating proxies, spoofing headers and mask user agents. But it was like playing whack-a-mole. Amazon’s defenses are built to detect that kind of behavior.

Then a different idea hit me: what if I didn’t hide at all?

I tried identifying myself as a real user. I took the headers and cookies from my logged-in Amazon browser session and included them in the scraper. That one shift - treating the scraper like a legitimate browser session completely changed the game. Suddenly, requests were going through without resistance. I had reliable access to the data.

The Final Push: Firecrawl, Fine-Tuning, and Fitting the LLM

While that approach worked, it wasn’t scalable! I didn’t want other users to have to extract their own cookies or manually set up headers. So I returned to Firecrawl, but with a new approach: using their API directly rather than the native integration inside Relevance AI.

This gave me the flexibility I needed. I even got in touch with the Firecrawl team, and they made backend changes to support scraping amazon.com.au.

Once the data started flowing, a new problem surfaced: Amazon pages are massive. In fact, far too large to fit into a language model’s context window.

To solve that, I fine-tuned the scraper and agent to extract only the core content - a single, static HTML element that reliably contained the data I needed. Something small, predictable, and not dynamically generated.

That made all the difference.

The output was finally small enough for the LLM to process, allowing it to fetch the price agentically as part of a broader reasoning workflow removing the need for hard-coded logic.

After countless dead ends and pivots, I finally had a solution that worked - robust, repeatable, and scalable.

Link to portfolio

Kenneth Tse