Files
Calendink/Provider/AgentTasks/epic3_voice_to_task.md
T

1.6 KiB

Epic 3: Voice-to-Task AI (Gemini)

Goal

Allow users to dictate tasks naturally in French, directly creating JSON task structures on the Provider.

Context

Adding tasks manually using a keyboard via the web dashboard is friction. We want to utilize automated intelligence (Google Gemini API) to ingest natural language (specifically French) and figure out the JSON properties (task name, user, due date). Input can come from the Svelte frontend (browser recording) or an iOS Shortcut automation.

Scope & Technologies to Investigate

  1. Ingestion Endpoint:
    • Create a POST /api/tasks/audio endpoint on the ESP32-S3 Provider.
    • Decide exactly what format this accepts. Is it raw audio binaries like .wav or .m4a? Or is it transcribed text sent from the device's native speech-to-text? (Note during implementation if ESP32 memory constraints force pre-transcription on iOS before sending).
  2. Gemini API Integration:
    • Implement an HTTP/HTTPS client on the ESP32-S3 to call Gemini API endpoints.
    • Construct a prompt template: "You are an assistant. Extract the task properties from the following French audio/text and output strict JSON matching our schema: { "title": "...", "due_date": "...", "user_name": "..." }"
  3. API Key Management:
    • Add a Svelte dashboard configuration page to allow the user to securely save their Gemini API key to the SD Card.

Next Steps to Start

  1. Create a tdd/voice_ai_integration.md.
  2. Secure an HTTPS connection from ESP-IDF to the Google Gemini API (managing certs/mbedtls).
  3. Test a hardcoded prompt execution before wiring up the web UI.