Data Processing Workflows

This document describes the end-to-end data processing workflows that keep the SG Cars Trends platform updated with the latest vehicle registration and COE bidding data.

Workflow Sequence Diagram

Workflow Components

1. Cars Data Processing Workflow

Purpose: Process monthly vehicle registration data from LTA DataMall Trigger:

Scheduled: Every hour during business hours (9 AM - 6 PM, Mon-Fri)
Manual: POST /workflows/trigger endpoint

Process Flow:

Timestamp Check: Query Redis for last_updated:cars to determine if processing is needed
File Download: Fetch “Monthly New Registration of Cars by Make.zip” from LTA DataMall
Checksum Validation: Compare file checksum with previous version to detect changes
Data Extraction: Extract and parse CSV files if changes detected
Data Transformation:
- Clean special characters from make names
- Normalize vehicle type classifications
- Convert string numbers to integers
Database Insert: Batch insert new records with conflict resolution
Cache Update: Update Redis timestamp for successful processing
Blog Generation: Create AI-powered blog post analyzing latest data
Social Media: Announce new blog post across configured platforms

Data Sources: Monthly New Registration of Cars by Make.zip Key Fields Processed:

month: YYYY-MM format
make: Vehicle manufacturer (normalized)
fuel_type: Petrol, Diesel, Electric, Hybrid
vehicle_type: Cars, Motorcycles, Buses, etc.
number: Registration count

2. COE Data Processing Workflow

Purpose: Process Certificate of Entitlement bidding results Trigger:

Scheduled: Every hour during business hours
Manual: POST /workflows/trigger endpoint

Process Flow:

Timestamp Check: Query Redis for last_updated:coe
File Download: Fetch COE bidding results from LTA DataMall
Checksum Validation: Detect file changes
Data Processing: Parse COE bidding data and PQP rates
Database Updates: Insert bidding results and premium rates
Conditional Blog Generation:
- Generate blog posts only when both bidding exercises are complete (bidding_no = 2)
- Provides comprehensive monthly COE analysis
Social Media: Announce completed bidding results and blog posts

Data Sources:

COE bidding results CSV
Prevailing Quota Premium (PQP) rates

Key Fields Processed:

month: Bidding month
bidding_no: Exercise number (1 or 2)
vehicle_class: COE category
quota: Available certificates
bids_received: Total bids submitted
premium: Winning premium (SGD)

3. Blog Generation Workflow

Purpose: Create AI-powered market insights from processed data AI Integration: Vercel AI SDK with Google Gemini for content generation Content Creation Process:

Data Analysis: Analyze latest month’s data for trends and insights
Content Generation: Use LLM to create comprehensive blog posts
Structured Output: Generate title, content, excerpt, and metadata
SEO Optimization: Create SEO-friendly slugs and descriptions
Storage: Save to database with duplicate prevention
Social Distribution: Announce new content across platforms

Blog Post Features:

Executive summaries with key insights
Data tables for fuel type and vehicle type breakdowns
Market trend analysis and implications
Professional market commentary
Reading time estimation
AI attribution and model versioning

Purpose: Distribute updates across multiple social platforms Supported Platforms:

Discord: Webhook-based notifications
LinkedIn: Business-focused updates
Telegram: Channel messaging
Twitter: Social media engagement

Publishing Flow:

Platform Validation: Check configuration for each enabled platform
Parallel Publishing: Simultaneously post to all configured platforms
Error Handling: Graceful degradation with Discord error notifications
Result Aggregation: Collect success/failure statistics

Error Handling and Reliability

Redundancy Prevention

Checksum Validation: Files are only processed if checksums differ from previous versions Timestamp Tracking: Redis timestamps prevent duplicate processing Database Constraints: Unique key constraints prevent duplicate records

Error Recovery

Workflow State: QStash maintains workflow state and enables retries Discord Notifications: Failed workflows trigger Discord alerts with error details Graceful Degradation: Social media failures don’t block data processing

Monitoring and Logging

CloudWatch Logs: Comprehensive logging for all workflow steps Structured Logging: JSON-formatted logs for easy parsing Error Tracking: Detailed error messages with context

Configuration

Environment Variables

Core Workflow:

QSTASH_TOKEN: QStash authentication
LTA_DATAMALL_API_KEY: LTA DataMall access
DATABASE_URL: PostgreSQL connection
UPSTASH_REDIS_REST_*: Redis configuration

AI Integration:

GOOGLE_GENERATIVE_AI_API_KEY: Google Gemini API key (used by Vercel AI SDK)

Social Media:

DISCORD_WEBHOOK_URL: Error notifications
LINKEDIN_*: LinkedIn API credentials
TELEGRAM_*: Bot configuration
TWITTER_*: Twitter API credentials

Scheduling Configuration

Cron Schedule: */60 0-10 * * 1-5

Every hour from 9 AM to 6 PM
Monday through Friday (Singapore business hours)
Aligned with LTA DataMall update patterns

Performance Considerations

Batch Processing

Database inserts use batch operations for efficiency
Large datasets processed in chunks to avoid timeouts

Caching Strategy

Redis caching for API responses
Timestamp-based cache invalidation
CDN caching for static content

Resource Optimization

ARM64 Lambda functions for cost efficiency
120-second timeout for long-running processes
Memory allocation optimized for CSV processing

Getting Started

Architecture

Development

Examples

Workflows

Data Processing Workflows

Workflow Sequence Diagram

Workflow Components

1. Cars Data Processing Workflow

2. COE Data Processing Workflow

3. Blog Generation Workflow

Error Handling and Reliability

Redundancy Prevention

Error Recovery

Monitoring and Logging

Configuration

Environment Variables

Scheduling Configuration

Performance Considerations

Batch Processing

Caching Strategy

Resource Optimization

Getting Started

Architecture

Development

Examples

​Data Processing Workflows

​Workflow Sequence Diagram

​Workflow Components

​1. Cars Data Processing Workflow

​2. COE Data Processing Workflow

​3. Blog Generation Workflow

​4. Social Media Integration Workflow

​Error Handling and Reliability

​Redundancy Prevention

​Error Recovery

​Monitoring and Logging

​Configuration

​Environment Variables

​Scheduling Configuration

​Performance Considerations

​Batch Processing

​Caching Strategy

​Resource Optimization

​Related Documentation

Data Processing Workflows

Workflow Sequence Diagram

Workflow Components

1. Cars Data Processing Workflow

2. COE Data Processing Workflow

3. Blog Generation Workflow

4. Social Media Integration Workflow

Error Handling and Reliability

Redundancy Prevention

Error Recovery

Monitoring and Logging

Configuration

Environment Variables

Scheduling Configuration

Performance Considerations

Batch Processing

Caching Strategy

Resource Optimization

Related Documentation