Why Visual Intent Will Eat Typed Search
Published 21 April 2026 · 8 min read
What the camera is becoming
For decades, typed search was the default entry point for information. Google Lens crossed a billion monthly users in 2022 (Google I/O 2022 keynote). Apple Visual Lookup ships on every iPhone since iOS 15. Snap has publicly positioned its camera as its primary search surface, and TikTok added visual search in 2023. The camera is no longer a secondary input; it is a first-class one.
Why this wasn’t possible until now
Three things changed. On-device neural processing reached a point where visual embeddings can be computed on a phone without ruinous battery cost. Multi-modal models (CLIP, 2021; PaLI, 2022; GPT-4V, 2023) collapsed the historical gap between "what does this image contain" and "what should happen next". And AR headsets (Meta Quest 3, Apple Vision Pro launched 2024) signalled that the camera is about to be always- on for a meaningful user base.
What’s missing
All the existing lenses identify. None of them transact. You point at a restaurant; Google Lens tells you the name. You still have to switch apps to book a table. You point at a product; Apple Visual Lookup tells you what it is. You still have to find a shopping flow. The commit step is broken.
This is not because the platforms don’t understand the opportunity. It is because the supply-side liquidity (real services willing to expose a one-tap commit surface) doesn’t exist yet. A lens without services is identification theatre.
Why GeraLens can
The twenty-eight Gera products are the supply side. Every Gera vertical exposes a well-known capabilities manifest (via GeraNexus). The lens maps a recognised object to the most relevant Gera action and makes that action one tap away.
The user behaviour already exists
"Check if that restaurant is worth going to" is already a camera behaviour — people photograph the sign, then switch to Maps or Yelp. "Is this medication safe with my other medication?" is already a camera behaviour — photographed label, switched to a search engine. The lens collapses those two steps into one.
What this is not
Not "another AR app". AR is one delivery surface; the lens is the capability. It runs on a phone today and on AR glasses tomorrow.
The hard parts
On-device recognition for privacy — the camera frame must not routinely leave the device. Intent disambiguation — pointing at a restaurant could mean "book" or "order delivery" or "get directions". Graceful degradation — the network is not always there. These are real engineering problems, not marketing problems.
How this interacts with agents
In 2028, an agent running on your AR glasses will not need you to trigger the lens manually. It will see what you see, notice when an interaction is helpful, and surface the option ambiently. The consent layer for that future looks a lot like GeraMind’s per- query consent.
Honest caveats
AR glasses penetration is uncertain. If viable consumer AR hardware doesn’t ship by 2028 — Meta / Apple / a surprise entrant — GeraLens remains a mobile-only product. That is still useful; it is just smaller than the full vision.
Help us design ambient discovery.
Join the waitlist