Back to Works

Ruby

The manual killer. A real-time concierge that saw through the customer's camera, answered with contextual video, and adapted to their exact setup — the brand's best engineer, standing right there beside them.

AI Concierge Computer Vision Post-Purchase AppliedMind
Ruby - an AI product concierge from AppliedMind

What Ruby did

Ruby was an interactive guide that met your customers the moment they unboxed. Not a chatbot. Not a FAQ. A real-time concierge that saw what they saw, answered what they asked, and walked them through every step using their camera, your knowledge, and contextual video.

Support was never meant to be a static PDF or a long hold time. Ruby made it feel like the brand's best engineer was standing right there beside them.

Why the market cares

The story was never “another support bot.” Ruby was a product concierge — plugged into the brand's own customer data so it could resolve the moment quickly, and, more importantly, capture what happened. The end goal was the data: insights from real customers in real failure, fed back to the people designing the next version of the product.

The feedback loop for product design

Ruby was more than a product concierge or the ability to FaceTime your product. It was the feedback loop for product design.

The loudest signal in the field is the moment a customer reaches for help — and almost none of it ever reaches the people designing the next version. Tickets get categorized for resolution speed, summaries get drained of everything specific, and by the time anything lands in a design review it's a bar chart of complaint categories. The room the product sits in, the eight-second hesitation, the workaround the user invented — gone.

Ruby was instrumented at exactly that moment. Because it ran through the customer's camera while they were stuck, every session was effectively a structured field study: real environment, real user, real failure, captured automatically with video evidence of where the design gave way.

That made a different kind of artifact possible for the design team. Not “users complain about pairing,” but: forty-seven users hesitated more than eight seconds at step three this month, here are the clips, ranked by frustration signal, split by firmware version. A friction cluster could be routed into Linear or Figma as a design ticket with video attached, not buried in a quarterly slide.

And because Ruby kept watching after every fix shipped, we could close the loop the other direction too — did the v2.3 firmware actually move the cluster, or did the friction just relocate? That is what closing the loop to design actually means: the same expert lens that helped the customer in the kitchen, turned around and pointed at the engineer designing the next version.

Product experience

From AR-first to phone-first

The original vision was AR-first. A natural extension of the situated displays and remote-expert overlays from my PhD work — let customers troubleshoot hands-free, with virtual aids registered to the actual device in front of them. The design language was already there; the hardware story was the bottleneck.

The AR lift turned out to be too heavy for a post-purchase product. Headset adoption is near zero in the home, content authoring for AR is its own discipline, and registration breaks the moment someone picks up the box. The pivot went two directions, both lighter. First, agentic web crawlers turned a brand's existing materials — site, PDFs, support articles, product videos — into structured knowledge automatically. Second, we showed that knowledge through the customer's phone camera and voice in the spirit of Google's Project Astra — same expert-looking-over-your-shoulder feel, none of the hardware barrier.

My role

I was Ruby's product designer — the customer-facing flow, the operator dashboard, the way the camera session and the feedback loop fit together. I also bootstrapped the stack: a multi-modal LLM for the conversational agent, a custom RF-DETR vision model trained to recognize each brand's product parts, WebRTC for the live video call, an iOS app with ARKit, and a Next.js / TypeScript / Node.js / Firebase operator dashboard.

A few architectural decisions worth naming. On vision, the starting hypothesis was per-brand VLM fine-tuning. The better answer turned out to be a much smaller, purpose-trained RF-DETR model that learned to find the specific bits and parts of a given product — faster, cheaper, more reliable than a general VLM, and easier to ship per customer.

On video, modern live VLMs cap out around one frame per second, but the things that matter in the field — a status LED blinking, a hand briefly entering frame, the user pulling the device closer — happen faster than that. Ruby ran the call over WebRTC and made the full stream available alongside the sampled frames, so the agent had context the model couldn't see directly.

On clients, the first prototype annotated over the ARKit depth mesh; customer testing made it clear the install step was the failure point, not the AR. The shipping path became QR-code-to-web — scan, allow camera, you're in.

My co-founder, Safwan Siddiqui, led outreach and customer development.

Ruby went to private beta with brands before AppliedMind wound down.