EZ-Wallet — Solana Trading Automation
Autonomous, risk-managed order execution across 5+ Solana DEXs, built so a degraded RPC or a dead worker doesn't become a missed or doubled trade.
- Rust
- Axum
- Tokio
- Redis Streams
- ClickHouse
- SQLite
- Kubernetes
- gRPC
- Jito
Problem
Run trading strategies 24/7 with no human in the loop, on a chain where the market moves inside a ~400ms slot and RPC endpoints degrade exactly when volatility is highest. A late order is a wrong order; a double-sent order is worse.
Constraints
- Hot path budget sub-second end-to-end (~100–250ms typical). // TODO: confirm exact split
- Solana slot ≈400ms — execution must be slot-accurate, not just 'fast'.
- RPC reliability drops under the exact volatility the strategies trade.
- Multi-wallet: encrypted key storage, no cross-wallet bleed.
- No missed trades and no double-executes — correctness over throughput.
Shape
- CREATED
- PENDING
- CONFIRMED
- SETTLED
Decisions & tradeoffs
Rust + Axum/Tokio for the backend (not Node)
Predictable tail latency and real concurrency for the daemon and 60+ APIs.
tradeoffSlower feature velocity and a smaller ecosystem to lean on.
Event-driven 6-worker daemon (not request/response)
Sub-millisecond trigger evaluation across PumpFun, PumpSwap, Raydium, Meteora.
tradeoffReal coordination/observability complexity — failures are now distributed.
3-tier data: SQLite + Redis Streams + ClickHouse
Each access pattern (transactional state / event flow / time-series audit) gets the right tool.
tradeoffThree systems to operate and keep consistent instead of one.
gRPC → RPC fallback → timeout watchdog confirmation
An order is only reported settled when it provably is — false settles are unacceptable.
tradeoffAdded confirmation latency into an already tight budget.
Outcome
Incident
◇ Incident report
- What happened
- During a sharp volatility spike, RPC latency climbed and the daemon's trigger evaluation fell behind the slot clock. A batch of conditional orders evaluated late and missed their intended slot — the system was fast enough on an average minute and not on the worst one.
- What it changed
- Average-case latency was a lie I'd been telling myself. I moved to budgeting against the worst minute, not the mean: added RPC-degradation detection with fallback routing and shed/queue-aware load before evaluation, so falling behind degrades gracefully instead of silently dropping slots.
What I'd do differently
SQLite for transactional state was the right bet for a single-node start and the wrong one as wallet count grows — I'd reach for Postgres earlier. I'd also have built the volatility load-shedding before the first incident forced it, not after.