🐸 Version 3.12.2
February 26, 2026
Summary
Infrastructure reliability update. Fixed shard reporting inconsistencies that caused user/server counts to fluctuate wildly, corrected LXC container memory and CPU metrics on the admin dashboard, added automatic background metrics collection for queue throughput charts, and introduced a new Bot Growth chart to track users, servers, and translations over time.
Bug Fixes
Shard Reporting Stability
- Partial shard detection — The global stats updater now counts how many shards returned valid (non-zero) data. If fewer than 75% of shards reported, the update is rejected and the previous values are kept, preventing incomplete counts from being persisted.
- Suspicious drop rejection — If either the user count or guild count drops by more than 20% compared to the last recorded value, the update is rejected. Genuine growth between 30-minute update intervals is well under 1%; a 20%+ drop indicates shard failures.
- Better logging — Each stats update now logs the raw shard counts, how many shards reported valid data, and the expected total, making it much easier to diagnose future issues.
LXC Container Monitoring Accuracy
- RAM reporting fixed — Glances running inside LXC containers was reporting the host's memory usage against the container's memory limit, producing impossible values like 420% RAM. The dashboard now detects when
used > totaland recalculates from/proc/meminfofields that LXC properly virtualizes. - CPU core count fixed — Glances was reporting the host's physical core count (e.g., 28 cores) instead of the container's allocated logical CPUs (e.g., 4 cores). The dashboard now prefers logical core count over physical.
- Progress bar clamping — RAM progress bars are now clamped to 100% to prevent visual overflow even if stale data slips through.
New Features
Bot Growth Chart
- Historical tracking — A new
BotStatsSnapshotdatabase model records guild count, user count, total translations, premium translations, auto-translate completions, and conversation bridge completions approximately every 5 minutes. - Growth visualization — A new chart on the admin dashboard plots user count, server count, and translation volume over time with selectable time ranges (24h, 7d, 30d, MTD, QTD, YTD).
- Delta badges — Summary badges show the current value and the change (growth/decline) over the selected time range.
- Metric toggles — Click the metric labels below the chart to show or hide individual data series.
Automatic Queue Metrics Collection
- Background instrumentation — Queue throughput metrics (Translation Queue Throughput and System Queue Activity) are now automatically collected every 60 seconds via a Next.js
instrumentation.tshook, eliminating the need for an external cron job. - Shared collector — The metrics collection logic was extracted into a reusable service module used by both the background interval and the manual cron endpoint.
Infrastructure
- Added Glances monitoring to all 13 LXC containers and bare-metal hosts.
- Configured DNS resolution inside Docker Swarm containers to use the Pi-hole DNS server for
.labdomain resolution. - All Glances instances running as systemd services using Python virtual environments.
- Dashy dashboard on Frigg updated with status widgets for all monitored servers.
Technical Details
- Prisma: New
BotStatsSnapshotmodel with timestamp index - Next.js instrumentation:
src/instrumentation.tsruns collection on server boot (15s initial delay, then every 60s) - Shard threshold: 75% of expected shards must report non-zero for an update to be accepted
- Drop threshold: 20% maximum acceptable decrease between consecutive updates
- Snapshot frequency: ~5 minutes (probabilistic, 1 in 5 metric collection cycles)
- Data retention: 90 days for both queue metrics and bot stats snapshots