Protocol Deep-Dive: The Frame-to-Action Pipeline
Published 21 April 2026 · 12 min read
Why a pipeline, not a model
The temptation is to point at a frame and have a single model decide everything. This is unsafe: bystanders, private settings, and misidentifications compound. A pipeline with explicit stages and refusal points keeps the costly mistakes local to a stage instead of happening at the transaction.
Stage 1: on-device classifier (the privacy gate)
A small on-device model decides whether the current frame is relevant to commerce at all. A menu, a storefront, a product barcode, a car licence plate, a damaged appliance — yes. A face, a minor, a private document, a bathroom, a hospital bed — no. The gate model produces {relevant: bool, category, sensitivity_flags}. Nothing leaves the phone unless relevance is confirmed and sensitivity flags are clear.
Stage 2: cloud recognition (the structured descriptor)
If the gate says yes, a tightly-cropped embedding of the region of interest travels to the cloud. The cloud returns a structured descriptor:
{
"object": "restaurant_storefront",
"text": "Sajna Indian Kitchen",
"address_hint": "Leicester LE1",
"confidence": 0.88,
"alternatives": [{ "object": "bakery", "confidence": 0.04 }]
}The frame itself is not retained. Only the descriptor and a tiny perceptual hash (for abuse detection) are kept.
Stage 3: intent resolution
The descriptor maps to a Gera vertical via a published routing table. restaurant_storefront routes to GeraEats (book a table or order delivery). damaged_appliance routes to GeraHome (repair service quote). vehicle_plate routes to GeraRide / GeraSure.
Stage 4: consent scoping
Before any transaction proposal, the user sees a quote card with the detected object, the resolved intent, the proposed action, and the expected cost. The user presses confirm; a narrowly-scoped consent token (via GeraNexus) is issued.
Stage 5: signed commit
The transaction proceeds under GeraNexus semantics — negotiate, book, pay-in-escrow, confirm. The completion receipt is signed and returned to the user’s GeraMind vault for future reference.
What the pipeline refuses
- Face recognition of anyone ever, for any purpose.
- Minors as the primary subject of any commerce action.
- Private-setting frames (interiors of homes other than the user’s own, public bathrooms, medical environments).
- Documents containing personal data (passports, IDs, medical letters) unless the user explicitly invoked a document-scan action.
Audit
Every pipeline stage writes a log entry. The user can inspect the chain for any transaction. Third-party arbitrators can verify the chain during dispute. Refused frames are logged with the refusal reason but not the frame content.
What we are still designing
- On-device gate-model size vs phone-battery impact. Smaller model = more false positives sent to cloud = more privacy leakage. This is a tuning problem.
- Bystander consent at scale. No simple answer; current design blurs bystanders aggressively in the cloud-hosted embedding.
- Cross-cultural visual vocabulary — a storefront in Tbilisi does not look like a storefront in Leicester. The routing table must be regional.
Help us design ambient discovery.
Join the waitlist