Price Stalker
Full-stack price tracker that scrapes e-commerce sites, emails deal alerts, and recommends alternatives with AI microservices.
2025
- Java
- Spring Boot
- React
- RabbitMQ
- Selenium
- AWS
Overview & problem
Prices on e-commerce platforms move constantly, and the only way to catch a real deal is to keep checking, which nobody does. Price Stalker watches products for you: it scrapes listings on a schedule, stores price history, and emails you the moment a price drops below your threshold. I designed, built, deployed and operated the whole system myself.
Architecture
┌─────────────────────────────────────────────┐
│ AWS (ECS/EC2) │
React SPA ──────▶│ Spring Boot API ──▶ PostgreSQL (RDS) │
(S3 + CF) │ │ Spring Security (JWT) │
│ ▼ │
│ RabbitMQ ◀── Scheduled scrapers │
│ │ (Selenium · Jsoup · Scrapy) │
│ ▼ │
│ Notification worker ──▶ SMTP (alerts) │
│ AI microservices: chat · speech-to-text · │
│ product-alternative recommendations │
└─────────────────────────────────────────────┘
CI/CD: GitHub Actions ▶ ECR ▶ ECS · Secrets Manager · Cloudflare DNS
Key technical decisions
- RabbitMQ between scraping and alerting. Scrape runs are bursty; queueing decouples producers from the notification worker and gives retries for free instead of coupling everything to one request cycle.
- Three scraping strategies. Selenium for JavaScript-heavy storefronts, Jsoup for fast static parsing, Scrapy for crawl-style collection — picked per target site rather than forcing one tool to do everything.
- Mixed schema design. Normalized core entities for integrity, with selectively denormalized read paths for the price-history queries the UI hits hardest.
- Stateless auth. Spring Security filter chain issuing JWTs, so API nodes scale horizontally with no session affinity.
- Secrets Manager over env files. Credentials never live in images or repos; CI/CD assumes roles instead.
Challenges & lessons
Scraping is a moving target: markup drifts, sites rate-limit, and a scraper that worked yesterday silently returns nothing today. The fix was defensive parsing, per-site strategies, and alerting on empty scrape results. Treat the scraper fleet as production software, not scripts. Running the stack on ECS also taught me real containerization discipline: image size, health checks, and least-privilege IAM.
Results
The system runs unattended on AWS: scheduled scrapers feed price history, users register and set thresholds through the React app, and alert emails go out within minutes of a detected drop. CI/CD via GitHub Actions deploys from merge to production without manual steps.