Architecture Proposal | Kalshi Weather Research

● Executive Summary

The "reading the thermometer" strategy is validated. Highs are the strong suit at 92.9% accuracy across all stations, and lows work in 5 of 7 cities at 80%+ accuracy at dawn. After critical review by Grok-4 and Gemini-2.5-Flash, three major architectural changes were adopted.

92.9%

Highs Accuracy

80%+

Lows (5/7 Cities)

3

Major Revisions

Change 1: Replaced Claude AI decision engine with deterministic rule-based logic. LLMs are non-deterministic, slow (~12s/call), and add zero value to a pure math decision. The rule engine executes in ~1ms.

Change 2: Identified pricing validation as the critical next step before writing any code. High accuracy means nothing if Kalshi prices already reflect that accuracy. Must prove edge exists with historical price data first.

Change 3: Made buffer rules and trading windows station-specific. Miami highs are tighter (±1.5°F) because the climate is less volatile. Denver lows are skipped entirely because 60% accuracy is a coin flip.

▲ What Changed Before vs After

Side-by-side comparison of the original architecture proposal versus the revised design after AI review feedback.

Original Proposal

Decision Engine: Claude Sonnet via claude --print (~12s per call). LLM interprets METAR data and decides whether to trade. Non-deterministic outputs.

Buffer Rule: Flat ±2°F applied uniformly to every station. No differentiation between volatile and stable climates.

Trading Windows: Fixed per timezone. Same cutoff for all stations within a timezone regardless of local accuracy patterns.

Profitability: Assumed directly from accuracy numbers. No analysis of actual Kalshi contract prices or market efficiency.

Claude's Role: Core decision maker sitting in the hot path of every trade decision.

Crash Recovery: Windows Task Scheduler restart only. No state persistence across crashes.

Revised Proposal

Decision Engine: Deterministic Node.js rule engine (~1ms). Pure if/else logic with no ambiguity. Same input always produces same output.

Buffer Rule: Station-specific: ±1.5°F (Miami/LA highs), ±2°F (standard), ±3°F (volatile lows). Tuned to backtest data.

Trading Windows: Station-specific thresholds derived from backtest accuracy data. Each station has its own optimal entry time.

Profitability: Must validate with historical Kalshi prices before building anything. This is the go/kill gate.

Claude's Role: Daily briefing advisor (weather fronts, anomalies) + post-settlement audit. Never in the hot path.

Crash Recovery: Task Scheduler + state.json persistence + D1 dedup layer + Discord heartbeat monitoring.

Aspect	Original	Revised	Impact
Decision Engine	Claude Sonnet via `claude --print` (~12s/call)	Deterministic Node.js rules (~1ms)	12,000x faster
Buffer Rule	Flat ±2°F everywhere	Station-specific: ±1.5°F / ±2°F / ±3°F	Data-driven
Trading Windows	Fixed per timezone	Station-specific thresholds from backtest	Per-station
Profitability	Assumed from accuracy	Must validate with historical Kalshi prices FIRST	Go/Kill gate
Claude's Role	Core decision maker	Daily briefing advisor + post-settlement audit	Advisory only
Crash Recovery	Task Scheduler only	Task Scheduler + state.json + D1 dedup + Discord heartbeat	4-layer safety

◆ Station Configuration 7 Stations

Per-station configuration derived from backtest analysis. Each station has its own trading windows, accuracy thresholds, buffer sizes, and tier assignment. Tier 1 stations trade both lows and highs. Tier 2 stations trade highs only.

Station	City	TZ	Lows Window	Lows Acc.	Highs Window	Highs Acc.	Buffer (L)	Buffer (H)	Tier
`KMDW`	Chicago	CT	Buy after 6 AM	82%	Buy after 2 PM	84%	±2°F	±2°F	Tier 1
`KNYC`	NYC	ET	Buy after 5 AM	80%	Buy after 2 PM	85%	±2°F	±2°F	Tier 1
`KMIA`	Miami	ET	Buy after 4 AM	82%	Buy after 1 PM	92%	±2°F	±1.5°F	Tier 1
`KLAX`	LA	PT	Buy after 4 AM	84%	Buy after 11 AM	81%	±2°F	±1.5°F	Tier 1
`KAUS`	Austin	CT	SKIP LOWS	71% max	Buy after 2 PM	84%	N/A	±2°F	Tier 2
`KDEN`	Denver	MT	SKIP LOWS	60% max	Buy after 2 PM	91%	N/A	±2°F	Tier 2
`KPHL`	Philly	ET	Buy after 5 AM	80%	Buy after 2 PM	89%	±2°F	±2°F	Tier 1

5

Tier 1 Stations (Lows + Highs)

2

Tier 2 Stations (Highs Only)

12

Max Daily Trades

Tier logic: Tier 1 stations have lows accuracy ≥ 80% and trade both contract types. Tier 2 stations (Austin, Denver) have lows accuracy below the profitability threshold and only trade highs. This prevents bleeding money on unreliable low forecasts. Maximum 12 trades per day: 5 lows + 7 highs.

■ Decision Engine Core Logic

The heart of the system is a pure deterministic function. No LLM in the hot path. No network calls beyond METAR and Kalshi. Same input always produces the same output. Executes in under 1 millisecond.

<1ms

Execution Time

0

LLM Calls in Hot Path

6

Decision Steps

// The entire decision engine — pure math, no AI

function shouldTrade(station, contractType, currentTemp, bracket, kalshiPrice) {

  // 1. Check if we're in the trading window
  if (!isInTradingWindow(station, contractType)) return HOLD;

  // 2. Check data freshness
  if (metarAge > 20 minutes) return HOLD;  // stale data

  // 3. Get station-specific buffer
  const buffer = getBuffer(station, contractType);

  // 4. For LOWS: is running min well inside the bracket?
  if (contractType === 'LOW') {
    const runningMin = getRunningMin(station);
    if (runningMin <= bracket.upper - buffer && runningMin >= bracket.lower + buffer) {
      // 5. Check Kalshi price — is there actually an edge?
      if (kalshiPrice.yes < (accuracy[station].lows / 100)) {
        return BUY_YES;
      }
    }
  }

  // 6. For HIGHS: is running max well inside the bracket?
  if (contractType === 'HIGH') {
    const runningMax = getRunningMax(station);
    if (runningMax <= bracket.upper - buffer && runningMax >= bracket.lower + buffer) {
      if (kalshiPrice.yes < (accuracy[station].highs / 100)) {
        return BUY_YES;
      }
    }
  }

  return HOLD;
}

Step-by-Step Breakdown

Step 1 — Trading Window Gate: Each station has a specific time after which trading is allowed. For example, KMIA highs open at 1 PM ET because backtest data shows 92% accuracy by that point. Before this time, the function returns HOLD unconditionally. This is the first line of defense against premature trades.

Step 2 — Data Freshness Check: METAR data older than 20 minutes is considered stale. Weather stations occasionally go offline or experience reporting delays. If the last observation is too old, we refuse to trade. This prevents acting on outdated information that could be degrees off from current conditions.

Step 3 — Station-Specific Buffer: The buffer is not a flat value. Miami and LA highs use ±1.5°F because their climates are less volatile. Standard stations use ±2°F. Volatile low temperature environments would use ±3°F. The buffer ensures the current reading is "well inside" the bracket, not just barely touching the edge.

Steps 4-5 — Lows Logic: For low temperature contracts, we track the running minimum temperature since midnight. If that running min sits comfortably inside the Kalshi bracket (accounting for buffer), we have a high-confidence read. But we only buy if the Kalshi YES price is less than our historical accuracy — that's the edge. If Miami lows are 82% accurate and the contract is priced at 75 cents, there's a 7-cent expected edge.

Step 6 — Highs Logic: Mirror image of lows. Track the running maximum temperature. If the running max is comfortably inside the bracket and the price leaves an edge, buy YES. Denver highs at 91% accuracy are extremely strong — if the contract is priced at 80 cents, that's an 11-cent expected edge per trade.

Critical Insight: This entire function is pure arithmetic. There is no LLM inference, no API call to Claude, no natural language interpretation. The decision is: Is the temperature inside the bracket with buffer? Is the price below our accuracy? If both YES, trade. Otherwise, hold. This is why the original Claude-in-the-loop design was replaced — an LLM adds latency, cost, and non-determinism to a problem that has an exact mathematical solution.

◼ System Architecture Runtime Flow

The production system is a Node.js daemon running on Windows, polling data every 5 minutes. All decisions are made locally with no external AI calls in the critical path.

Core Trading Pipeline

Data Source

IEM METAR API

Every 5 min

→

Daemon

Polling Daemon

Node.js process

→

Core

Rule Engine

<1ms decisions

→

Execution

Kalshi API

Order placement

→

Storage

D1 Trade Log

Cloudflare D1

↓ Alerts

Discord Webhook

Daily Claude Advisor Flow (Non-Critical Path)

Trigger

6:00 AM ET

Cron schedule

→

Advisor

Claude Daily Briefing

claude --print

→

Output

Weather Context

Fronts, anomalies, flags

→

Effect

Confidence Multipliers

Adjusts day's thresholds

Separation of concerns: The Claude advisor runs once per day at 6:00 AM, well before any trading windows open. It analyzes weather fronts, identifies anomalous conditions (e.g., cold fronts sweeping through Denver), and can adjust confidence multipliers for the day. But it never touches the hot path. If Claude is down or slow, the bot trades normally with default confidence. The advisor is a nice-to-have, not a dependency.

◉ Data Pipeline Observation-Only

This strategy uses zero forecast data. We are not predicting the weather — we are reading the thermometer and betting that it won't move far. All data sources are observational or market-based.

Primary: IEM METAR API

https://mesonet.agron.iastate.edu/cgi-bin/request/asos.py

Poll frequency: Every 5 minutes
Data returned: Latest METAR observation (temperature, dewpoint, wind, timestamp)
Coverage: All 7 stations (KMDW, KNYC, KMIA, KLAX, KAUS, KDEN, KPHL)
Reliability: Free public API, operated by Iowa State University. Extremely stable — rarely goes down
Latency: METAR observations are typically 5-20 minutes behind real-time (inherent in the reporting system)

Settlement: IEM CF6 JSON API

https://mesonet.agron.iastate.edu/json/cf6.py

Purpose: Post-settlement audit and accuracy tracking
Data returned: Official CF6 climate data (daily max/min temperatures as reported by NWS)
Usage: Compare our running min/max against official settlement values to track drift and accuracy

Market: Kalshi REST API

https://api.elections.kalshi.com/trade-api/v2/

Purpose: Contract prices, order book depth, order placement
Auth: API key-based authentication
Rate limits: Generous for retail (specific limits TBD per Kalshi docs)
Critical data: YES price for target brackets — used to determine if edge exists

Not Used

No forecast data. We do not use NWS forecasts, GFS model output, or any predictive weather data. The entire strategy is based on the observation that once the thermometer reads a value late enough in the day, it rarely moves outside the bracket. This is a nowcasting strategy, not a forecasting strategy.

Data Flow Timing

METAR Obs

:00, :20, :40

→

IEM Ingest

+2-5 min delay

→

Our Poll

Every 5 min

→

Decision

<1ms

→

Trade

~200ms API

End-to-end latency: From weather observation to trade execution is approximately 7-25 minutes (dominated by METAR reporting delay + IEM ingest). This is perfectly acceptable for a strategy that trades once per day per station. We are not high-frequency trading — we are making one deliberate decision when the data is ripe.

▣ Reliability & Recovery 4-Layer Safety

The bot runs on a Windows desktop machine. It must survive crashes, restarts, and transient failures without duplicating trades or missing trading windows.

Layer 1: Auto-Start

Windows Task Scheduler triggers the Node.js daemon on user login and on schedule. If the process crashes, the scheduler re-launches it. The daemon checks state.json on startup to restore context.

Layer 2: State Persistence

state.json is written to disk after every significant event (new METAR reading, trade executed, window entered). It persists:

Running min/max per station for today
Last METAR timestamp per station
Today's executed trades (station + type + price)
Daily Claude briefing results (confidence multipliers)

Layer 3: D1 Dedup

Cloudflare D1 database serves as the source of truth for trade history. The dedup key is {date}_{station}_{type} (e.g., 2026-02-19_KMIA_HIGH). Before placing any order, the bot checks D1. If a record exists for today's key, the trade is skipped. This prevents duplicates even if state.json is lost.

Layer 4: Discord Monitoring

Discord webhook provides real-time visibility:

Trade alerts: Every executed trade is posted with station, bracket, price, and reasoning
Error alerts: API failures, stale data warnings, circuit breaker triggers
Daily heartbeat: 8:00 AM "I'm alive" message with status summary
End-of-day report: P&L summary, trades executed, settlement comparison

Circuit Breakers

Condition	Action	Recovery
Kalshi API errors	Exp. Backoff	1s, 2s, 4s, 8s... max 5min
IEM stale data (>20min)	Pause Station	Resume when fresh METAR arrives
5 consecutive losses	Full Halt	Manual review required
Daily loss limit hit	Full Halt	Auto-resume next trading day
D1 unreachable	Degrade	Fall back to state.json dedup

// Crash recovery flow on startup
async function initialize() {
  // 1. Load local state
  const state = await loadStateJson();

  // 2. Verify against D1 (source of truth)
  const todaysTrades = await d1.query('SELECT * FROM trades WHERE date = ?', [today]);

  // 3. Reconcile: D1 wins on conflicts
  state.trades = reconcile(state.trades, todaysTrades);

  // 4. If running min/max is stale, fetch recent METARs to rebuild
  if (state.lastMetar < thirtyMinutesAgo) {
    state.runningMinMax = await rebuildFromIEM(today);
  }

  // 5. Post Discord heartbeat
  await discord.send(`Bot restarted. State recovered. ${todaysTrades.length} trades already placed.`);
}

⚠ Risk Management Capital Protection

Conservative position sizing designed for capital preservation. The bot is not designed to get rich — it's designed to grind out small edges consistently without ever blowing up.

Position Limits

Parameter	Value	Rationale
Max per contract	$5 - $10	Configurable, starts at $5
Trades per station/day	1	One trade per city per type per day
Max daily exposure	$60 - $120	12 trades x $5-$10 each
Daily loss limit	Configurable	Triggers full halt for the day

What We Do NOT Do

✗ No shorting — we only buy YES contracts on brackets we're confident about
✗ No leveraging — each trade is a fixed dollar amount, no margin
✗ No doubling down — if a trade is placed, we don't add to the position
✗ No martingale — losses don't increase bet size; they trigger circuit breakers
✗ No forecasting — we don't bet on weather that hasn't happened yet

Edge Requirement

Minimum price edge: A trade is only placed when the Kalshi YES contract price is lower than the historical accuracy for that station and contract type. This ensures positive expected value over time.

// Example: KMIA Highs
historical_accuracy = 0.92  // 92%
kalshi_yes_price   = 0.85  // 85 cents

expected_value = accuracy - price
             = 0.92 - 0.85
             = +0.07  // 7 cents per dollar

// On $10 trade: EV = +$0.70 per trade
// Over 30 days: EV = +$21 from KMIA highs alone

Worst-Case Scenario

Maximum single-day loss: If every trade loses (all 12 contracts settle wrong), maximum loss is $60-$120 depending on position size. The 5-consecutive-loss circuit breaker would halt trading well before this point in practice. Over a month, even a 20% loss rate on $5 trades means only ~$72 in losses against expected gains of ~$200+ (if prices offer edges).

▶ Implementation Roadmap 4 Phases

Strictly phased implementation. Phase 1 is the go/kill gate — if pricing data doesn't show an edge, the project is killed before any code is written.

Phase 1: Pricing Validation CRITICAL — Before Writing Any Code

This is the kill gate. Everything else is contingent on proving that an edge exists in Kalshi's pricing. High accuracy is meaningless if the market already prices it in.

Pull historical Kalshi price data for all 7 stations — ideally several months of contract prices at the times we would trade
Backtest profitability: For each station/window/contract type, calculate accuracy × contract_price across all historical days
Determine actual edge: Does buying YES at 2 PM for KDEN highs (91% accurate) actually pay off after Kalshi's fees and spreads?
Kill/go decision: If expected value per trade is < $0.02, the project is not viable at current position sizes. Either find higher edges or shelve the project.

?

Expected Edge (TBD)

?

Avg YES Price at Trade Time

?

Post-Fee Profitability

Phase 2: Build Core System

Only proceed if Phase 1 confirms positive expected value. Build the minimum viable trading system.

Node.js daemon with the deterministic rule engine as described in Section 5
METAR polling module — fetches from IEM every 5 min, updates running min/max per station
Kalshi API integration — authentication, contract lookup, order placement, position tracking
D1 trade logging — Cloudflare D1 database for trade history and dedup
Discord notifications — webhook integration for trade alerts, errors, and heartbeat

Phase 3: Harden Production Readiness

Make the system resilient to real-world failures. No trade should be lost or duplicated due to infrastructure issues.

Windows Task Scheduler auto-start configuration
State persistence — state.json write-on-change with crash recovery logic
Circuit breakers — exponential backoff, stale data pause, consecutive loss halt
Claude daily briefing — 6:00 AM advisor integration for confidence multipliers
Post-settlement audit — compare trades against CF6 official data, track actual vs. expected accuracy

Phase 4: Go Live Graduated Deployment

Slow, methodical deployment. Start with paper trading, graduate to real money, scale up only with proven results.

Paper trading for 2 weeks — bot runs live but does not place real orders; logs what it would have traded
Scale to $5/trade real money — minimum position size, monitor daily P&L closely
Monitor and adjust — tune buffers, windows, and confidence multipliers based on live performance
Graduate to full position sizing — $10/trade after 30 days of proven profitability

PHASE 1

Price Validation

1-2 weeks

→

PHASE 2

Build

2-3 weeks

→

PHASE 3

Harden

1-2 weeks

→

PHASE 4

Go Live

2 weeks paper + scale

? Open Questions Must Resolve

These questions must be answered during Phase 1 before committing to building the full system. Each directly impacts the viability of the project.

1. What does the Kalshi order book look like at 2 PM / 5 AM?
This determines whether an edge actually exists at the times we want to trade. If the order book is thin or the spread is wide, our theoretical edge may evaporate in execution. Need to observe live order books across multiple days to understand typical liquidity and spreads at our target trading windows.

2. Can we get historical Kalshi prices via API or do we need to scrape?
Phase 1 requires historical contract prices at specific times of day. If Kalshi's API provides historical snapshots, this is straightforward. If not, we may need to build a scraper or poll live data for 2-4 weeks before we can run the profitability backtest. This directly impacts Phase 1 timeline.

3. Should we add more cities if Kalshi expands?
The architecture supports adding new stations easily (just add a config entry). But each new station needs its own backtest analysis to determine accuracy, optimal windows, and buffer sizes. No station should be traded without at least 60 days of historical accuracy data proving ≥ 80% reliability.

4. What's the optimal position sizing model after we have price data?
Currently using flat $5-$10 per trade. With price data, we could implement Kelly Criterion or fractional Kelly sizing to optimize capital allocation. Higher-edge stations (KMIA highs at 92%) could justify larger positions than lower-edge stations (KLAX highs at 81%). This optimization is a Phase 4 concern.

Bottom line: Questions 1 and 2 are Phase 1 blockers. Until we know what Kalshi prices look like at our target trading times, we cannot determine if the strategy is profitable. Everything else is academic until that data is in hand.

Revised Architecture Proposal

● Executive Summary

▲ What Changed Before vs After

Original Proposal

Revised Proposal

◆ Station Configuration 7 Stations

■ Decision Engine Core Logic

Step-by-Step Breakdown

◼ System Architecture Runtime Flow

Core Trading Pipeline

Daily Claude Advisor Flow (Non-Critical Path)

◉ Data Pipeline Observation-Only

Primary: IEM METAR API

Settlement: IEM CF6 JSON API

Market: Kalshi REST API

Not Used

Data Flow Timing

▣ Reliability & Recovery 4-Layer Safety

Layer 1: Auto-Start

Layer 2: State Persistence

Layer 3: D1 Dedup

Layer 4: Discord Monitoring

Circuit Breakers

⚠ Risk Management Capital Protection

Position Limits

What We Do NOT Do

Edge Requirement

Worst-Case Scenario

▶ Implementation Roadmap 4 Phases

Phase 1: Pricing Validation CRITICAL — Before Writing Any Code

Phase 2: Build Core System

Phase 3: Harden Production Readiness

Phase 4: Go Live Graduated Deployment

? Open Questions Must Resolve