In-App Live Assistant

Table of Contents

This is a project I am actively pursuing: an open-source “in-app live assistant” pattern that other product teams can embed.

The core idea is simple: if the assistant can listen and see the current UI, it can give concrete next-step guidance instead of generic chatbot answers.

Goal #

Build a reusable baseline that can be integrated into a web product to provide:

Live help: microphone + screen context in one session.
Fast responses: low-latency, voice-first guidance.
Step-by-step flow: one instruction, wait for action, next instruction.
Measurement hooks: optional xAPI statements so “learning-in-the-product” can be observed, not just assumed.

Working assumption #

My assumption is that most “help” content fails inside products because it is separated from the moment of need and staying in the flow.

In this case, I am treating help as a live interaction loop:

the user tries a step
the assistant sees what happened
the assistant adjusts and guides the next step
the assistant and operator stay in the flow with natural language

If I can make that loop reliable, the rest (knowledge integration, evaluation, productization) becomes a sequence of iterations.

Current approach (high level) #

I am building this as a small set of composable parts, not a monolith:

A lightweight web client that can capture mic audio + screen context and stream it during a session.
A streaming runtime that manages sessions and forwards live inputs to a multimodal model, then streams responses back.
An instruction + guardrail layer tuned for support: brief, concrete, no professional advice, and willing to say “I don’t know”.
A grounding interface (initially generic) that can later be swapped for product documentation and internal procedures.
An xAPI emitter (optional) to record progress signals to a Learning Record Store (LRS).

xAPI angle (why include it) #

I want to test a specific idea: if the assistant is guiding someone through a workflow, that guidance is a learning experience, and it should be measurable.

At a high level, I am looking to capture signals like:

a user attempted a step
a step was completed (or failed)
the assistant intervened
time-to-complete and number of corrections

The point is not surveillance. The point is to create feedback loops:

Did guidance reduce errors?
Did completion rates improve?
Which steps cause repeat confusion?
Are users becoming more independent over time?

Status #

Status
Working end-to-end prototype of the live interaction loop (voice + screen → voice guidance)
Package the pattern as a clean open-source “starter kit”
Create a defined Cloud Infrastructure approach that works for the prototype and captures/logs PII info
Define a small xAPI vocabulary for “guided in-product learning” events
Add a minimal evaluation harness (latency, completion, error rate)

Risks and constraints #

Privacy and security: screen + voice can contain sensitive data. Production use needs explicit consent, access control, and retention limits.
Grounding quality: without a curated knowledge source, the assistant can be confidently wrong. Scope must be narrow.
Reliability: streaming UX breaks in messy ways (permissions, jitter, reconnection). This needs real hardening work.

Next step #

My next step is to turn the prototype into something others can actually use:

define the smallest “embed surface” for a product team
define the cloud and AI architecture
add the xAPI measurement path behind a feature flag
publish a reference workflow and measure it end-to-end

Related: my first write-up of the experiment path that led here is in /posts/live-streaming-in-app-assistant/.