Revised Architecture Proposal
Final bot design incorporating backtest data and AI review feedback. Updated Feb 19, 2026.
● Executive Summary
The "reading the thermometer" strategy is validated. Highs are the strong suit at 92.9% accuracy across all stations, and lows work in 5 of 7 cities at 80%+ accuracy at dawn. After critical review by Grok-4 and Gemini-2.5-Flash, three major architectural changes were adopted.
Change 1: Replaced Claude AI decision engine with deterministic rule-based logic. LLMs are non-deterministic, slow (~12s/call), and add zero value to a pure math decision. The rule engine executes in ~1ms.
Change 2: Identified pricing validation as the critical next step before writing any code. High accuracy means nothing if Kalshi prices already reflect that accuracy. Must prove edge exists with historical price data first.
Change 3: Made buffer rules and trading windows station-specific. Miami highs are tighter (±1.5°F) because the climate is less volatile. Denver lows are skipped entirely because 60% accuracy is a coin flip.
▲ What Changed Before vs After
Side-by-side comparison of the original architecture proposal versus the revised design after AI review feedback.
Original Proposal
Decision Engine: Claude Sonnet via claude --print (~12s per call). LLM interprets METAR data and decides whether to trade. Non-deterministic outputs.
Buffer Rule: Flat ±2°F applied uniformly to every station. No differentiation between volatile and stable climates.
Trading Windows: Fixed per timezone. Same cutoff for all stations within a timezone regardless of local accuracy patterns.
Profitability: Assumed directly from accuracy numbers. No analysis of actual Kalshi contract prices or market efficiency.
Claude's Role: Core decision maker sitting in the hot path of every trade decision.
Crash Recovery: Windows Task Scheduler restart only. No state persistence across crashes.
Revised Proposal
Decision Engine: Deterministic Node.js rule engine (~1ms). Pure if/else logic with no ambiguity. Same input always produces same output.
Buffer Rule: Station-specific: ±1.5°F (Miami/LA highs), ±2°F (standard), ±3°F (volatile lows). Tuned to backtest data.
Trading Windows: Station-specific thresholds derived from backtest accuracy data. Each station has its own optimal entry time.
Profitability: Must validate with historical Kalshi prices before building anything. This is the go/kill gate.
Claude's Role: Daily briefing advisor (weather fronts, anomalies) + post-settlement audit. Never in the hot path.
Crash Recovery: Task Scheduler + state.json persistence + D1 dedup layer + Discord heartbeat monitoring.
| Aspect |
Original |
Revised |
Impact |
| Decision Engine |
Claude Sonnet via claude --print (~12s/call) |
Deterministic Node.js rules (~1ms) |
12,000x faster |
| Buffer Rule |
Flat ±2°F everywhere |
Station-specific: ±1.5°F / ±2°F / ±3°F |
Data-driven |
| Trading Windows |
Fixed per timezone |
Station-specific thresholds from backtest |
Per-station |
| Profitability |
Assumed from accuracy |
Must validate with historical Kalshi prices FIRST |
Go/Kill gate |
| Claude's Role |
Core decision maker |
Daily briefing advisor + post-settlement audit |
Advisory only |
| Crash Recovery |
Task Scheduler only |
Task Scheduler + state.json + D1 dedup + Discord heartbeat |
4-layer safety |
◆ Station Configuration 7 Stations
Per-station configuration derived from backtest analysis. Each station has its own trading windows, accuracy thresholds, buffer sizes, and tier assignment. Tier 1 stations trade both lows and highs. Tier 2 stations trade highs only.
| Station |
City |
TZ |
Lows Window |
Lows Acc. |
Highs Window |
Highs Acc. |
Buffer (L) |
Buffer (H) |
Tier |
KMDW |
Chicago |
CT |
Buy after 6 AM |
82% |
Buy after 2 PM |
84% |
±2°F |
±2°F |
Tier 1 |
KNYC |
NYC |
ET |
Buy after 5 AM |
80% |
Buy after 2 PM |
85% |
±2°F |
±2°F |
Tier 1 |
KMIA |
Miami |
ET |
Buy after 4 AM |
82% |
Buy after 1 PM |
92% |
±2°F |
±1.5°F |
Tier 1 |
KLAX |
LA |
PT |
Buy after 4 AM |
84% |
Buy after 11 AM |
81% |
±2°F |
±1.5°F |
Tier 1 |
KAUS |
Austin |
CT |
SKIP LOWS |
71% max |
Buy after 2 PM |
84% |
N/A |
±2°F |
Tier 2 |
KDEN |
Denver |
MT |
SKIP LOWS |
60% max |
Buy after 2 PM |
91% |
N/A |
±2°F |
Tier 2 |
KPHL |
Philly |
ET |
Buy after 5 AM |
80% |
Buy after 2 PM |
89% |
±2°F |
±2°F |
Tier 1 |
5
Tier 1 Stations (Lows + Highs)
2
Tier 2 Stations (Highs Only)
Tier logic: Tier 1 stations have lows accuracy ≥ 80% and trade both contract types. Tier 2 stations (Austin, Denver) have lows accuracy below the profitability threshold and only trade highs. This prevents bleeding money on unreliable low forecasts. Maximum 12 trades per day: 5 lows + 7 highs.
■ Decision Engine Core Logic
The heart of the system is a pure deterministic function. No LLM in the hot path. No network calls beyond METAR and Kalshi. Same input always produces the same output. Executes in under 1 millisecond.
// The entire decision engine — pure math, no AI
function shouldTrade(station, contractType, currentTemp, bracket, kalshiPrice) {
// 1. Check if we're in the trading window
if (!isInTradingWindow(station, contractType)) return HOLD;
// 2. Check data freshness
if (metarAge > 20 minutes) return HOLD; // stale data
// 3. Get station-specific buffer
const buffer = getBuffer(station, contractType);
// 4. For LOWS: is running min well inside the bracket?
if (contractType === 'LOW') {
const runningMin = getRunningMin(station);
if (runningMin <= bracket.upper - buffer && runningMin >= bracket.lower + buffer) {
// 5. Check Kalshi price — is there actually an edge?
if (kalshiPrice.yes < (accuracy[station].lows / 100)) {
return BUY_YES;
}
}
}
// 6. For HIGHS: is running max well inside the bracket?
if (contractType === 'HIGH') {
const runningMax = getRunningMax(station);
if (runningMax <= bracket.upper - buffer && runningMax >= bracket.lower + buffer) {
if (kalshiPrice.yes < (accuracy[station].highs / 100)) {
return BUY_YES;
}
}
}
return HOLD;
}
Step-by-Step Breakdown
Step 1 — Trading Window Gate: Each station has a specific time after which trading is allowed. For example, KMIA highs open at 1 PM ET because backtest data shows 92% accuracy by that point. Before this time, the function returns HOLD unconditionally. This is the first line of defense against premature trades.
Step 2 — Data Freshness Check: METAR data older than 20 minutes is considered stale. Weather stations occasionally go offline or experience reporting delays. If the last observation is too old, we refuse to trade. This prevents acting on outdated information that could be degrees off from current conditions.
Step 3 — Station-Specific Buffer: The buffer is not a flat value. Miami and LA highs use ±1.5°F because their climates are less volatile. Standard stations use ±2°F. Volatile low temperature environments would use ±3°F. The buffer ensures the current reading is "well inside" the bracket, not just barely touching the edge.
Steps 4-5 — Lows Logic: For low temperature contracts, we track the running minimum temperature since midnight. If that running min sits comfortably inside the Kalshi bracket (accounting for buffer), we have a high-confidence read. But we only buy if the Kalshi YES price is less than our historical accuracy — that's the edge. If Miami lows are 82% accurate and the contract is priced at 75 cents, there's a 7-cent expected edge.
Step 6 — Highs Logic: Mirror image of lows. Track the running maximum temperature. If the running max is comfortably inside the bracket and the price leaves an edge, buy YES. Denver highs at 91% accuracy are extremely strong — if the contract is priced at 80 cents, that's an 11-cent expected edge per trade.
Critical Insight: This entire function is pure arithmetic. There is no LLM inference, no API call to Claude, no natural language interpretation. The decision is: Is the temperature inside the bracket with buffer? Is the price below our accuracy? If both YES, trade. Otherwise, hold. This is why the original Claude-in-the-loop design was replaced — an LLM adds latency, cost, and non-determinism to a problem that has an exact mathematical solution.
◼ System Architecture Runtime Flow
The production system is a Node.js daemon running on Windows, polling data every 5 minutes. All decisions are made locally with no external AI calls in the critical path.
Core Trading Pipeline
Data Source
IEM METAR API
Every 5 min
→
Daemon
Polling Daemon
Node.js process
→
Core
Rule Engine
<1ms decisions
→
Execution
Kalshi API
Order placement
→
Storage
D1 Trade Log
Cloudflare D1
Daily Claude Advisor Flow (Non-Critical Path)
Trigger
6:00 AM ET
Cron schedule
→
Advisor
Claude Daily Briefing
claude --print
→
Output
Weather Context
Fronts, anomalies, flags
→
Effect
Confidence Multipliers
Adjusts day's thresholds
Separation of concerns: The Claude advisor runs once per day at 6:00 AM, well before any trading windows open. It analyzes weather fronts, identifies anomalous conditions (e.g., cold fronts sweeping through Denver), and can adjust confidence multipliers for the day. But it never touches the hot path. If Claude is down or slow, the bot trades normally with default confidence. The advisor is a nice-to-have, not a dependency.
◉ Data Pipeline Observation-Only
This strategy uses zero forecast data. We are not predicting the weather — we are reading the thermometer and betting that it won't move far. All data sources are observational or market-based.
Primary: IEM METAR API
https://mesonet.agron.iastate.edu/cgi-bin/request/asos.py
- Poll frequency: Every 5 minutes
- Data returned: Latest METAR observation (temperature, dewpoint, wind, timestamp)
- Coverage: All 7 stations (KMDW, KNYC, KMIA, KLAX, KAUS, KDEN, KPHL)
- Reliability: Free public API, operated by Iowa State University. Extremely stable — rarely goes down
- Latency: METAR observations are typically 5-20 minutes behind real-time (inherent in the reporting system)
Settlement: IEM CF6 JSON API
https://mesonet.agron.iastate.edu/json/cf6.py
- Purpose: Post-settlement audit and accuracy tracking
- Data returned: Official CF6 climate data (daily max/min temperatures as reported by NWS)
- Usage: Compare our running min/max against official settlement values to track drift and accuracy
Market: Kalshi REST API
https://api.elections.kalshi.com/trade-api/v2/
- Purpose: Contract prices, order book depth, order placement
- Auth: API key-based authentication
- Rate limits: Generous for retail (specific limits TBD per Kalshi docs)
- Critical data: YES price for target brackets — used to determine if edge exists
Not Used
No forecast data. We do not use NWS forecasts, GFS model output, or any predictive weather data. The entire strategy is based on the observation that once the thermometer reads a value late enough in the day, it rarely moves outside the bracket. This is a nowcasting strategy, not a forecasting strategy.
Data Flow Timing
→
IEM Ingest
+2-5 min delay
→
→
→
End-to-end latency: From weather observation to trade execution is approximately 7-25 minutes (dominated by METAR reporting delay + IEM ingest). This is perfectly acceptable for a strategy that trades once per day per station. We are not high-frequency trading — we are making one deliberate decision when the data is ripe.
▣ Reliability & Recovery 4-Layer Safety
The bot runs on a Windows desktop machine. It must survive crashes, restarts, and transient failures without duplicating trades or missing trading windows.
Layer 1: Auto-Start
Windows Task Scheduler triggers the Node.js daemon on user login and on schedule. If the process crashes, the scheduler re-launches it. The daemon checks state.json on startup to restore context.
Layer 2: State Persistence
state.json is written to disk after every significant event (new METAR reading, trade executed, window entered). It persists:
- Running min/max per station for today
- Last METAR timestamp per station
- Today's executed trades (station + type + price)
- Daily Claude briefing results (confidence multipliers)
Layer 3: D1 Dedup
Cloudflare D1 database serves as the source of truth for trade history. The dedup key is {date}_{station}_{type} (e.g., 2026-02-19_KMIA_HIGH). Before placing any order, the bot checks D1. If a record exists for today's key, the trade is skipped. This prevents duplicates even if state.json is lost.
Layer 4: Discord Monitoring
Discord webhook provides real-time visibility:
- Trade alerts: Every executed trade is posted with station, bracket, price, and reasoning
- Error alerts: API failures, stale data warnings, circuit breaker triggers
- Daily heartbeat: 8:00 AM "I'm alive" message with status summary
- End-of-day report: P&L summary, trades executed, settlement comparison
Circuit Breakers
| Condition |
Action |
Recovery |
| Kalshi API errors |
Exp. Backoff |
1s, 2s, 4s, 8s... max 5min |
| IEM stale data (>20min) |
Pause Station |
Resume when fresh METAR arrives |
| 5 consecutive losses |
Full Halt |
Manual review required |
| Daily loss limit hit |
Full Halt |
Auto-resume next trading day |
| D1 unreachable |
Degrade |
Fall back to state.json dedup |
// Crash recovery flow on startup
async function initialize() {
// 1. Load local state
const state = await loadStateJson();
// 2. Verify against D1 (source of truth)
const todaysTrades = await d1.query('SELECT * FROM trades WHERE date = ?', [today]);
// 3. Reconcile: D1 wins on conflicts
state.trades = reconcile(state.trades, todaysTrades);
// 4. If running min/max is stale, fetch recent METARs to rebuild
if (state.lastMetar < thirtyMinutesAgo) {
state.runningMinMax = await rebuildFromIEM(today);
}
// 5. Post Discord heartbeat
await discord.send(`Bot restarted. State recovered. ${todaysTrades.length} trades already placed.`);
}
⚠ Risk Management Capital Protection
Conservative position sizing designed for capital preservation. The bot is not designed to get rich — it's designed to grind out small edges consistently without ever blowing up.
Position Limits
| Parameter |
Value |
Rationale |
| Max per contract |
$5 - $10 |
Configurable, starts at $5 |
| Trades per station/day |
1 |
One trade per city per type per day |
| Max daily exposure |
$60 - $120 |
12 trades x $5-$10 each |
| Daily loss limit |
Configurable |
Triggers full halt for the day |
What We Do NOT Do
- ✗ No shorting — we only buy YES contracts on brackets we're confident about
- ✗ No leveraging — each trade is a fixed dollar amount, no margin
- ✗ No doubling down — if a trade is placed, we don't add to the position
- ✗ No martingale — losses don't increase bet size; they trigger circuit breakers
- ✗ No forecasting — we don't bet on weather that hasn't happened yet
Edge Requirement
Minimum price edge: A trade is only placed when the Kalshi YES contract price is lower than the historical accuracy for that station and contract type. This ensures positive expected value over time.
// Example: KMIA Highs
historical_accuracy = 0.92 // 92%
kalshi_yes_price = 0.85 // 85 cents
expected_value = accuracy - price
= 0.92 - 0.85
= +0.07 // 7 cents per dollar
// On $10 trade: EV = +$0.70 per trade
// Over 30 days: EV = +$21 from KMIA highs alone
Worst-Case Scenario
Maximum single-day loss: If every trade loses (all 12 contracts settle wrong), maximum loss is $60-$120 depending on position size. The 5-consecutive-loss circuit breaker would halt trading well before this point in practice. Over a month, even a 20% loss rate on $5 trades means only ~$72 in losses against expected gains of ~$200+ (if prices offer edges).
▶ Implementation Roadmap 4 Phases
Strictly phased implementation. Phase 1 is the go/kill gate — if pricing data doesn't show an edge, the project is killed before any code is written.
Phase 1: Pricing Validation CRITICAL — Before Writing Any Code
This is the kill gate. Everything else is contingent on proving that an edge exists in Kalshi's pricing. High accuracy is meaningless if the market already prices it in.
- Pull historical Kalshi price data for all 7 stations — ideally several months of contract prices at the times we would trade
- Backtest profitability: For each station/window/contract type, calculate
accuracy × contract_price across all historical days
- Determine actual edge: Does buying YES at 2 PM for KDEN highs (91% accurate) actually pay off after Kalshi's fees and spreads?
- Kill/go decision: If expected value per trade is < $0.02, the project is not viable at current position sizes. Either find higher edges or shelve the project.
?
Avg YES Price at Trade Time
Phase 2: Build Core System
Only proceed if Phase 1 confirms positive expected value. Build the minimum viable trading system.
- Node.js daemon with the deterministic rule engine as described in Section 5
- METAR polling module — fetches from IEM every 5 min, updates running min/max per station
- Kalshi API integration — authentication, contract lookup, order placement, position tracking
- D1 trade logging — Cloudflare D1 database for trade history and dedup
- Discord notifications — webhook integration for trade alerts, errors, and heartbeat
Phase 3: Harden Production Readiness
Make the system resilient to real-world failures. No trade should be lost or duplicated due to infrastructure issues.
- Windows Task Scheduler auto-start configuration
- State persistence —
state.json write-on-change with crash recovery logic
- Circuit breakers — exponential backoff, stale data pause, consecutive loss halt
- Claude daily briefing — 6:00 AM advisor integration for confidence multipliers
- Post-settlement audit — compare trades against CF6 official data, track actual vs. expected accuracy
Phase 4: Go Live Graduated Deployment
Slow, methodical deployment. Start with paper trading, graduate to real money, scale up only with proven results.
- Paper trading for 2 weeks — bot runs live but does not place real orders; logs what it would have traded
- Scale to $5/trade real money — minimum position size, monitor daily P&L closely
- Monitor and adjust — tune buffers, windows, and confidence multipliers based on live performance
- Graduate to full position sizing — $10/trade after 30 days of proven profitability
PHASE 1
Price Validation
1-2 weeks
→
→
→
PHASE 4
Go Live
2 weeks paper + scale
? Open Questions Must Resolve
These questions must be answered during Phase 1 before committing to building the full system. Each directly impacts the viability of the project.
1. What does the Kalshi order book look like at 2 PM / 5 AM?
This determines whether an edge actually exists at the times we want to trade. If the order book is thin or the spread is wide, our theoretical edge may evaporate in execution. Need to observe live order books across multiple days to understand typical liquidity and spreads at our target trading windows.
2. Can we get historical Kalshi prices via API or do we need to scrape?
Phase 1 requires historical contract prices at specific times of day. If Kalshi's API provides historical snapshots, this is straightforward. If not, we may need to build a scraper or poll live data for 2-4 weeks before we can run the profitability backtest. This directly impacts Phase 1 timeline.
3. Should we add more cities if Kalshi expands?
The architecture supports adding new stations easily (just add a config entry). But each new station needs its own backtest analysis to determine accuracy, optimal windows, and buffer sizes. No station should be traded without at least 60 days of historical accuracy data proving ≥ 80% reliability.
4. What's the optimal position sizing model after we have price data?
Currently using flat $5-$10 per trade. With price data, we could implement Kelly Criterion or fractional Kelly sizing to optimize capital allocation. Higher-edge stations (KMIA highs at 92%) could justify larger positions than lower-edge stations (KLAX highs at 81%). This optimization is a Phase 4 concern.
Bottom line: Questions 1 and 2 are Phase 1 blockers. Until we know what Kalshi prices look like at our target trading times, we cannot determine if the strategy is profitable. Everything else is academic until that data is in hand.