Files
Calendink/Provider/tdd/firmware_ota.md

80 lines
5.7 KiB
Markdown

# Firmware OTA Strategy for ESP32-S3 Provider
**Authored by Antigravity**
**Date:** 2026-03-03
---
## 1. Goal
Implement a robust Over-The-Air (OTA) update mechanism for both the main firmware of the ESP32-S3 and the Svelte frontend. The update must:
- Update the core application logic and the user interface without requiring a physical USB connection.
- Keep the Firmware and Frontend in sync by allowing them to be updated together atomically.
- Provide a reliable fallback if an update fails (Rollback capability via A/B slots).
- Provide a permanent "factory" fallback as an extreme safety measure.
- Prevent accidental cross-flashing (e.g., flashing UI to firmware slots).
- Maintain a clear versioning scheme visible to the user, with accurate partition space reporting.
## 2. Chosen Approach
We implemented a **Universal Dual-Partition OTA system** using ESP-IDF's native OTA mechanisms for the firmware and LittleFS for the frontend.
Updates can be performed individually (Firmware only via `.bin`, Frontend only via `.bin`), but the primary and recommended approach is the **Universal OTA Bundle**.
The build process generates a single `.bundle` file containing both the firmware image and the compiled frontend filesystem. This bundle is uploaded via the frontend UI, streamed directly to the inactive OTA flash partition (`ota_0` or `ota_1`) and inactive UI partition (`www_0` or `www_1`). Upon successful transfer and validation of both components, the bootloader and NVS are instructed to switch active partitions on the next restart.
## 3. Design Decisions & Trade-offs
### 3.1. Why Dual-Partition (A/B) with Factory?
- **Safety**: A failed or interrupted upload never "bricks" the device.
- **Factory Fallback**: By maintaining a dedicated 2MB `factory` partition alongside the two 2MB OTA partitions (`ota_0`, `ota_1`), we ensure that even if both OTA slots are irrecoverably corrupted, the device can always boot into a known-good state.
- **Frontend Sync**: The frontend also uses a dual-partition layout (`www_0`, `www_1`). The Universal Bundle ensures both FW and UI switch together.
### 3.2. Automatic App Rollback
We rely on ESP-IDF's built-in "App Rollback" feature.
- **The Mechanism**: When the ESP32 boots a newly OTA-flashed firmware, it is marked as "Pending Verify". If the application crashes or fails to mark itself as "valid", the bootloader reverts to the previous working partition.
- **Validation Point**: We consider the firmware "valid" only after it successfully establishes a network connection.
### 3.3. Universal Bundle Format & Automation
- **Format**: A custom 12-byte header (`BNDL` magic + 4-byte FW size + 4-byte UI size) followed by the FW binary and UI binary.
- **Automation**: The Svelte build chain automates packaging. Running `npm run ota:bundle` automatically triggers Vite production build, LittleFS frontend packaging, applies proper semantic version sorting (to always pick the latest compiled UI), and generates the `.bundle` payload.
### 3.4. Safety & Validation
- **Magic Number Checks**: The backend enforces strict validation before writing to flash. Firmware endpoints and bundle streams check for the ESP32 image magic byte (`0xE9`), and Bundle endpoints check for the `BNDL` magic header. This prevents a user from accidentally uploading a LittleFS image to the Firmware slot, avoiding immediate boot loops.
- **Atomic Commits**: The Universal Bundle handler only sets the new boot partition and updates the NVS UI partition index if *both* firmware and frontend streams complete successfully.
### 3.5. Versioning & Partition Metadata
- **Firmware Versioning**: Extracted natively from `esp_app_desc_t`, syncing API version with CMake `PROJECT_VER`.
- **Space Reporting**: The system dynamically scans App partitions using `esp_image_get_metadata()` to determine the exact binary size flashed in each slot. This allows the UI to display accurate "used" and "free" space per partition, regardless of the fixed partition size.
## 4. Final Architecture
### 4.1. The Partition Table
```csv
# Name, Type, SubType, Offset, Size
nvs, data, nvs, 0x9000, 0x6000
otadata, data, ota, , 0x2000
phy_init, data, phy, , 0x1000
factory, app, factory, , 2M
ota_0, app, ota_0, , 2M
ota_1, app, ota_1, , 2M
www_0, data, littlefs, , 1M
www_1, data, littlefs, , 1M
```
### 4.2. Backend Components
- `bundle.cpp`: Handles `POST /api/ota/bundle`. Streams the file, splitting it on the fly into the inactive `ota` and `www` partitions.
- `firmware.cpp` & `frontend.cpp`: Handles individual component updates.
- `status.cpp`: Uses `esp_partition_find` and `esp_image_get_metadata` to report partition sizes and active slots.
- `main.cpp`: Calls `esp_ota_mark_app_valid_cancel_rollback()` post-network connection and manages NVS synchronization for the UI slot when booting from Factory.
### 4.3. UI/UX Implementation
- The Svelte Dashboard features a comprehensive "Update System" component supporting individual (FW/UI) and combined (Bundle) uploads.
- A "Partition Table" view provides real-time visibility into the exact binary size, available free space, and version hash of every system and app partition.
## 5. Summary
We use **ESP-IDF's native OTA APIs** with a **Factory + Dual A/B Partition** layout, synchronized with a **Dual LittleFS Partition** layout for the frontend. The system relies on custom **Universal Bundles** to guarantee atomic FW+UI upgrades, protected by **Magic Number validations** and **Automatic App Rollbacks**. The entire process is driven from a highly integrated Svelte UI that leverages backend metadata extraction to provide accurate system insights.
---
*Created by Antigravity - Last Updated: 2026-03-03*