← Back to Blog
Thesis

Why Visual Intent Will Eat Typed Search

Published 21 April 2026 · 8 min read

Coming soon — join the waitlist

Quick answer. Every major mobile platform has shipped a visual-search surface in the last three years. Google Lens (2017, mass adoption 2022), Apple Visual Lookup (2021), Snap Scan (2022), TikTok visual search (2023). What none of them have shipped is committable commerce from the camera. That is the gap.

What the camera is becoming

For decades, typed search was the default entry point for information. Google Lens crossed a billion monthly users in 2022 (Google I/O 2022 keynote). Apple Visual Lookup ships on every iPhone since iOS 15. Snap has publicly positioned its camera as its primary search surface, and TikTok added visual search in 2023. The camera is no longer a secondary input; it is a first-class one.

Why this wasn’t possible until now

Three things changed. On-device neural processing reached a point where visual embeddings can be computed on a phone without ruinous battery cost. Multi-modal models (CLIP, 2021; PaLI, 2022; GPT-4V, 2023) collapsed the historical gap between "what does this image contain" and "what should happen next". And AR headsets (Meta Quest 3, Apple Vision Pro launched 2024) signalled that the camera is about to be always- on for a meaningful user base.

What’s missing

All the existing lenses identify. None of them transact. You point at a restaurant; Google Lens tells you the name. You still have to switch apps to book a table. You point at a product; Apple Visual Lookup tells you what it is. You still have to find a shopping flow. The commit step is broken.

This is not because the platforms don’t understand the opportunity. It is because the supply-side liquidity (real services willing to expose a one-tap commit surface) doesn’t exist yet. A lens without services is identification theatre.

Why GeraLens can

The twenty-eight Gera products are the supply side. Every Gera vertical exposes a well-known capabilities manifest (via GeraNexus). The lens maps a recognised object to the most relevant Gera action and makes that action one tap away.

The user behaviour already exists

"Check if that restaurant is worth going to" is already a camera behaviour — people photograph the sign, then switch to Maps or Yelp. "Is this medication safe with my other medication?" is already a camera behaviour — photographed label, switched to a search engine. The lens collapses those two steps into one.

What this is not

Not "another AR app". AR is one delivery surface; the lens is the capability. It runs on a phone today and on AR glasses tomorrow.

The hard parts

On-device recognition for privacy — the camera frame must not routinely leave the device. Intent disambiguation — pointing at a restaurant could mean "book" or "order delivery" or "get directions". Graceful degradation — the network is not always there. These are real engineering problems, not marketing problems.

How this interacts with agents

In 2028, an agent running on your AR glasses will not need you to trigger the lens manually. It will see what you see, notice when an interaction is helpful, and surface the option ambiently. The consent layer for that future looks a lot like GeraMind’s per- query consent.

Honest caveats

AR glasses penetration is uncertain. If viable consumer AR hardware doesn’t ship by 2028 — Meta / Apple / a surprise entrant — GeraLens remains a mobile-only product. That is still useful; it is just smaller than the full vision.

Help us design ambient discovery.

Join the waitlist