Price Stalker

Full-stack price tracker that scrapes e-commerce sites, emails deal alerts, and recommends alternatives with AI microservices.

2025

  • Java
  • Spring Boot
  • React
  • RabbitMQ
  • Selenium
  • AWS

Overview & problem

Prices on e-commerce platforms move constantly, and the only way to catch a real deal is to keep checking, which nobody does. Price Stalker watches products for you: it scrapes listings on a schedule, stores price history, and emails you the moment a price drops below your threshold. I designed, built, deployed and operated the whole system myself.

Architecture

                 ┌─────────────────────────────────────────────┐
                 │                  AWS (ECS/EC2)              │
React SPA ──────▶│  Spring Boot API ──▶ PostgreSQL (RDS)       │
 (S3 + CF)       │   │  Spring Security (JWT)                  │
                 │   ▼                                         │
                 │  RabbitMQ ◀── Scheduled scrapers            │
                 │   │           (Selenium · Jsoup · Scrapy)   │
                 │   ▼                                         │
                 │  Notification worker ──▶ SMTP (alerts)      │
                 │  AI microservices: chat · speech-to-text ·  │
                 │  product-alternative recommendations        │
                 └─────────────────────────────────────────────┘
        CI/CD: GitHub Actions ▶ ECR ▶ ECS · Secrets Manager · Cloudflare DNS

Key technical decisions

  • RabbitMQ between scraping and alerting. Scrape runs are bursty; queueing decouples producers from the notification worker and gives retries for free instead of coupling everything to one request cycle.
  • Three scraping strategies. Selenium for JavaScript-heavy storefronts, Jsoup for fast static parsing, Scrapy for crawl-style collection — picked per target site rather than forcing one tool to do everything.
  • Mixed schema design. Normalized core entities for integrity, with selectively denormalized read paths for the price-history queries the UI hits hardest.
  • Stateless auth. Spring Security filter chain issuing JWTs, so API nodes scale horizontally with no session affinity.
  • Secrets Manager over env files. Credentials never live in images or repos; CI/CD assumes roles instead.

Challenges & lessons

Scraping is a moving target: markup drifts, sites rate-limit, and a scraper that worked yesterday silently returns nothing today. The fix was defensive parsing, per-site strategies, and alerting on empty scrape results. Treat the scraper fleet as production software, not scripts. Running the stack on ECS also taught me real containerization discipline: image size, health checks, and least-privilege IAM.

Results

The system runs unattended on AWS: scheduled scrapers feed price history, users register and set thresholds through the React app, and alert emails go out within minutes of a detected drop. CI/CD via GitHub Actions deploys from merge to production without manual steps.