Back to Works

Ruby

The manual killer. A real-time product concierge that sees through the customer's camera, answers with verbal instructions and contextual video clips, and adapts to their exact setup — the brand's best engineer, standing right there beside them.

AI Concierge Computer Vision Post-Purchase AppliedMind
Ruby - an AI product concierge from AppliedMind

What Ruby does

Ruby is an interactive guide that meets your customers the moment they unbox. Not a chatbot. Not a FAQ. A real-time concierge that sees what they see, answers what they ask, and walks them through every step using their camera, your knowledge, and contextual video.

Support was never meant to be a static PDF or a long hold time. Ruby makes it feel like the brand's best engineer is standing right there beside them.

Why hardware brands care

Ruby isn't another support bot. It's a way for a brand to bring its entire knowledge base — manuals, FAQs, everything its support team has learned on calls — to users as if they were on a FaceTime call with the brand's best engineer. That handles onboarding and troubleshooting, and, more important, it opens a window into what happens in users' homes. Ruby is a scalable forward-deployed engineer for every product — and like any good FDE, it comes back with what it learns, so the brand can build a better next version.

The feedback loop for product design

Ruby is more than a product concierge, more than the ability to FaceTime your product. It is the feedback loop for product design.

The loudest signal in the field is the moment a customer reaches for help — and almost none of it ever reaches the people designing the next version. Tickets get categorized for resolution speed, summaries get drained of everything specific, and by the time anything lands in a design review it's a bar chart of complaint categories. The room the product sits in, the eight-second hesitation, the workaround the user invented — gone.

Ruby is instrumented at exactly that moment. Because it runs through the customer's camera while they are stuck, every session is effectively a structured field study: real environment, real user, real failure, captured automatically with video evidence of where the design gives way.

That makes a different kind of artifact possible for the design team. Not “users complain about pairing,” but: 80% of users struggle to put the device into pairing mode — there's no feedback or indication that they need to hold for longer than five seconds. Here are the clips, ranked by frustration signal, split by firmware version.

And because Ruby keeps watching after every fix ships, we can close the loop the other direction too — did the v2.3 firmware actually move the cluster, or did the friction just relocate? That is what closing the loop to design actually means: the same expert lens that helps the customer in the kitchen, turned around and pointed at the engineer designing the next version.

Product experience

AI-native onboarding of brands

Onboarding a new brand takes hours, not weeks. Agentic crawlers ingest everything the brand already has — its site, manuals, support articles, and product videos — and turn it into structured knowledge.

What comes back isn't a black box. It's a white-box dashboard built from those same materials: editable how-to guides the brand can read, correct, and reuse — internally, or published straight to their website. When something's wrong, the brand edits the guide, not an agent prompt. No one has to learn prompt engineering to keep Ruby accurate.

From AR-first to phone-first

The original vision was AR-first. A natural extension of the situated displays and remote-expert overlays from my PhD work — let customers troubleshoot hands-free, with virtual aids registered to the actual device in front of them. The design language was already there; the hardware story was the bottleneck.

The AR lift turned out to be too heavy for a post-purchase product. Headset adoption is near zero in the home, content authoring for AR is its own discipline, and registration breaks the moment someone picks up the box. The pivot went two directions, both lighter. Authoring became automatic — built from the brand's existing materials instead of hand-made for AR. And delivery moved from a headset to the customer's phone camera and voice, in the spirit of Google's Project Astra — the same expert-looking-over-your-shoulder feel, none of the hardware barrier.

My role

I was Ruby's product designer — the customer-facing flow, the operator dashboard, the way the camera session and the feedback loop fit together. I also bootstrapped the stack: a multi-modal LLM for the conversational agent, a custom RF-DETR vision model trained to recognize each brand's product parts, WebRTC for the live video call, an iOS app with ARKit, and a Next.js / TypeScript / Node.js / Firebase operator dashboard.

A few architectural decisions worth naming. On vision, the starting hypothesis was per-brand VLM fine-tuning. The better answer turned out to be a much smaller, purpose-trained RF-DETR model that learned to find the specific bits and parts of a given product — faster, cheaper, more reliable than a general VLM, and easier to ship per customer.

On video, modern live VLMs cap out around one frame per second, but the things that matter in the field — a status LED blinking, a hand briefly entering frame, the user pulling the device closer — happen faster than that. Ruby ran the call over WebRTC and made the full stream available alongside the sampled frames, so the agent had context the model couldn't see directly.

On clients, the first prototype annotated over the ARKit depth mesh; customer testing made it clear the install step was the failure point, not the AR. The shipping path became QR-code-to-web — scan, allow camera, you're in.

My co-founder, Safwan Siddiqui, led outreach and customer development.

Ruby went to private beta with brands before AppliedMind wound down.