Price Stalker — Le Quang Bui

Overview & problem

Prices on e-commerce platforms move constantly, and the only way to catch a real deal is to keep checking, which nobody does. Price Stalker watches products for you: it scrapes listings on a schedule, stores price history, and emails you the moment a price drops below your threshold. I designed, built, deployed and operated the whole system myself.

Architecture

                 ┌─────────────────────────────────────────────┐
                 │                  AWS (ECS/EC2)              │
React SPA ──────▶│  Spring Boot API ──▶ PostgreSQL (RDS)       │
 (S3 + CF)       │   │  Spring Security (JWT)                  │
                 │   ▼                                         │
                 │  RabbitMQ ◀── Scheduled scrapers            │
                 │   │           (Selenium · Jsoup · Scrapy)   │
                 │   ▼                                         │
                 │  Notification worker ──▶ SMTP (alerts)      │
                 │  AI microservices: chat · speech-to-text ·  │
                 │  product-alternative recommendations        │
                 └─────────────────────────────────────────────┘
        CI/CD: GitHub Actions ▶ ECR ▶ ECS · Secrets Manager · Cloudflare DNS

Key technical decisions

RabbitMQ between scraping and alerting. Scrape runs are bursty; queueing decouples producers from the notification worker and gives retries for free instead of coupling everything to one request cycle.
Three scraping strategies. Selenium for JavaScript-heavy storefronts, Jsoup for fast static parsing, Scrapy for crawl-style collection — picked per target site rather than forcing one tool to do everything.
Mixed schema design. Normalized core entities for integrity, with selectively denormalized read paths for the price-history queries the UI hits hardest.
Stateless auth. Spring Security filter chain issuing JWTs, so API nodes scale horizontally with no session affinity.
Secrets Manager over env files. Credentials never live in images or repos; CI/CD assumes roles instead.

Challenges & lessons

Scraping is a moving target: markup drifts, sites rate-limit, and a scraper that worked yesterday silently returns nothing today. The fix was defensive parsing, per-site strategies, and alerting on empty scrape results. Treat the scraper fleet as production software, not scripts. Running the stack on ECS also taught me real containerization discipline: image size, health checks, and least-privilege IAM.

Results

The system runs unattended on AWS: scheduled scrapers feed price history, users register and set thresholds through the React app, and alert emails go out within minutes of a detected drop. CI/CD via GitHub Actions deploys from merge to production without manual steps.