Kenneth Tse 4/8/25 Kenneth Tse 4/8/25

My Journey Building a sales tracker agent

Scraping websites might sound like a straightforward programming challenge: send requests, get data, parse HTML. But anyone who’s ventured into the world of scraping knows it’s more of a cat-and-mouse game. And if you’re trying to scrape Amazon? It’s on another level. This is the story of how I went from abandoned ideas to a fully working solution - a journey marked by dead ends, unexpected insights, and relentless problem-solving.

The Spark That Never Left

This idea first came to me when I just started programming. I was fascinated by the practical side of web scraping, and Amazon was a natural first target. Fetch the price of a product everyday, and send a notification if something was on sale. But I had no idea how aggressive their anti-bot defenses were. They chewed me up and spat me out. I gave up. But the idea never really left me.

A New Angle with Relevance AI

A few years later, I started wondering again: what if I gave it another go? This time powered by agents built with Relevance AI? The idea was to scrape sites, then hand off the raw data to an agent that could intelligently extract and process it, the agentic way. I began with Firecrawl, an easy entry point for scraping. But the native integration available on Relevance had limitations, especially when it came to control and customization.

Python First… but JavaScript Wins for Scraping

I initially reached for Python which had better support and more flexibility on Relevance AI. But it didn’t take long to hit a wall. Python works great for static pages, but struggles with dynamic content rendered through JavaScript. So I switched to JavaScript, which gave me access to the full DOM, including all the dynamic elements that were invisible to Python-based tools. This shift made a huge difference and I could finally access the content I needed.

The Big Realization: Stop Hiding, Start Identifying

As I pushed forward, the next challenge was Amazon’s bot detection. My first instinct was to hide my tracks through rotating proxies, spoofing headers and mask user agents. But it was like playing whack-a-mole. Amazon’s defenses are built to detect that kind of behavior.

Then a different idea hit me: what if I didn’t hide at all?

I tried identifying myself as a real user. I took the headers and cookies from my logged-in Amazon browser session and included them in the scraper. That one shift - treating the scraper like a legitimate browser session completely changed the game. Suddenly, requests were going through without resistance. I had reliable access to the data.

The Final Push: Firecrawl, Fine-Tuning, and Fitting the LLM

While that approach worked, it wasn’t scalable! I didn’t want other users to have to extract their own cookies or manually set up headers. So I returned to Firecrawl, but with a new approach: using their API directly rather than the native integration inside Relevance AI.

This gave me the flexibility I needed. I even got in touch with the Firecrawl team, and they made backend changes to support scraping amazon.com.au.

Once the data started flowing, a new problem surfaced: Amazon pages are massive. In fact, far too large to fit into a language model’s context window.

To solve that, I fine-tuned the scraper and agent to extract only the core content - a single, static HTML element that reliably contained the data I needed. Something small, predictable, and not dynamically generated.

That made all the difference.

The output was finally small enough for the LLM to process, allowing it to fetch the price agentically as part of a broader reasoning workflow removing the need for hard-coded logic.

After countless dead ends and pivots, I finally had a solution that worked - robust, repeatable, and scalable.

Link to portfolio

Kenneth Tse 29/5/19 Kenneth Tse 29/5/19

The beginning

A quick intro to why I’m starting this blog - a space to track my journey, share my work, and explore ideas as I build.

Hi there and welcome!

I’ve decided to start this blog as a way to document and share my journey in building things. Over time, I’ve found myself constantly experimenting with tools, frameworks, and ideas — sometimes for fun, sometimes for learning, and often just to see what I can create. This blog is my little corner of the internet to capture all of that.

There are a few reasons I’m doing this:

To document my progress. Whether it’s a full project or just a small experiment, writing things down helps me reflect on what I’ve learned and how far I’ve come.
To share and showcase. Some of the stuff I build might be rough, unfinished, or just for fun — but I still want to put it out there. This blog gives me a place to show off the work I’m doing, no matter the scale.
To build my portfolio. Over time, this blog will act as a kind of living portfolio — a growing archive of projects, experiments, and ideas that show what I’m into and what I’m capable of.

I’m not sure yet how often I’ll post, or what format everything will take. Currently a lot of my interest is found in AI tools. But that can shift as I go.

For now, I’m just excited to have a space to create and share.