Computer Vision for
Retail Digitization.

A camera-first onboarding tool that turned a retailer’s stockroom into a published storefront — barcode, then image, then video recognition over a 60M+ SKU canonical catalog.

Daniel Manzela · Active Jan 2020 — Apr 2024

Co-founded with Dr. Eli Osherovich — Senior Applied Scientist at Amazon (Alexa Shopping / Amazon Go), now leading AI & infrastructure at Google. Bootstrapped four years with no external capital.

~50

Products / hour onboarding target

Computer Vision generations shipped

60M+

Unique SKUs canonicalized

A retailer scanning a product on the shop floor during a Seller App field test — Field test · in-store first scan · May 2022

Market Context

The Onboarding Bottleneck

Physical retailers need an online presence but lack the technical skills. Manual product upload is the single largest barrier to eCommerce adoption.

Skill gap

Brick-and-mortar retailers operate without in-house digital teams. Photographing, captioning, categorizing, and listing products is a workflow they have neither time nor expertise to perform.

Manual upload cost

~3 minutes per product × 50 products = 2.5 hours per store before a single photograph is taken. The rate-limiting step between a physical retailer and a working online storefront.

No frictionless path

No existing solution generated a complete product catalog from the camera alone. Every alternative still required typing, tagging, and image curation by hand.

Product Versions

Three Generations of Vision

The product evolved through three distinct computer-vision generations, each removing a layer of friction from the seller’s upload workflow.

VERSION 1.0

Barcode → auto-upload

Barcode + Database

A PWA-side barcode scanner powered by Google ML Kit (with zxing-cpp-emscripten as fallback). Each scan is enriched against the Global Product Database and pushed to a vendor’s WooCommerce store via the WC Marketplace REST API — the target was onboarding plus 50 products in under one hour.

Scanner: Google ML Kit, on-device · zxing-cpp-emscripten fallback

Auto-fill: Global Product Database lookup per UPC / EAN

Inventory: bulk upload + per-product overrides

Target: store onboard + 50 products in < 1 hour

Platform foundations shipped alongside V1 — sign-up, orders, marketing, settings, WooCommerce integration — and were inherited by every later version. See Architecture.

VERSION 2.0

Image recognition

Image + OCR

For products without scannable barcodes, a single photo became the input. Vision-AI OCR extracted on-package text, generated descriptions, tags, categories, and attributes, and a Computer Vision foundation-model API cleaned up product photography by removing backgrounds.

Input: single product photo from the camera

OCR: Vision AI text extraction from packaging

Generation: description, tags, categories, attributes

Cleanup: Computer Vision foundation-model background removal

VERSION 3.0

Video + RAG over the GPD

Video + Retrieval

The seller pans the camera across a shelf; frame-by-frame recognition retrieves matching SKUs from a dedicated RAG model over the Global Product Database. Attribute extraction expanded with synonym support, with the dataset mass-scraped per country, region, and language.

Input: video stream — frame-by-frame product recognition

Retrieval: dedicated RAG over Global Product Database

Attributes: enhanced extractor with synonym support

Coverage: per country / region / language scraping

Data Infrastructure

The Global Product Database

We did not ask the model to invent products. We asked it to find them. Every Computer Vision generation was constrained by a record already in the catalog — barcode, photo, and video became three different ways into the same retrieval layer.

Vision & Mission

One catalog, every retailer

A single canonical product record per SKU, enriched once and reused across every vendor on the platform. A small bodega and a regional chain receive the same structured data the moment they scan a product.

Ingestion stack

From scrapers to multilingual RAG

Shipped Web scraping — UPC / BarcodeLookup / image cropping
Shipped Bright Data — managed proxy & structured ingestion
Planned Multilingual RAG — per-region language coverage

System Design

Technical Architecture

A Progressive Web App over a WooCommerce-vendor backend, with a two-database split and a vision pipeline that swapped its front-end three times while the retrieval layer stayed stable.

AWS · compute MongoDB · two DBs React PWA WC Marketplace REST API WCMp Vendors API Google ML Kit Mixpanel GA4 CI / CD

PWA topology

Single React-based PWA installable across iOS, Android, and desktop. Camera, push, and offline capabilities without an app-store dependency — updates rolled out the moment a seller refreshed.

RTL + Hebrew — Google Auto-Translate for the launch market
Push notifications — react-web-notification (orders, low-stock)
Analytics — Mixpanel events / HEART funnel · GA4 acquisition

Two-database model

The architectural backbone. The catalog and the storefront were never the same thing — one is shared across vendors, the other is private to each vendor.

Global Product Database — shared, canonical, retrieval-indexed (60M+ SKUs)
Store Product Database — vendor-scoped inventory, pricing, overrides
Integration — WC Marketplace REST API · WCMp Vendors auth

Vision stack

The product replaced its CV front-end three times. The retrieval layer behind it stayed stable — which is why a four-year stack didn’t accumulate as much technical debt as it could have.

V1 — ML Kit on-device barcode scanning + GPD lookup
V2 — Vision-AI OCR + foundation-model background removal
V3 — RAG retrieval over the GPD on video frames

In the Field

Evidence from the Floor

Raw field documentation from physical-store testing — the seller, the camera, the product, and the upload happening in real time.

Captured Session · Barcode Scanner

Barcode scanning workflow

Raw testing session: scanning physical barcodes, retrieving product data from the GPD, auto-filling and publishing to the storefront.

What you’ll see A handheld camera centers a product barcode; the app reads it on-device, retrieves the matching record from the Global Product Database, and pre-fills the listing form ready for the vendor to confirm and publish to WooCommerce.

Captured Session · Computer Vision

Background removal & image normalization

Testing the Computer Vision pipeline: transforming raw smartphone photos into commercial-grade product images with automatic background removal.

What you’ll see A product photographed against a cluttered shop-floor background; the foundation-model cleanup pipeline removes the background and normalizes the image to a uniform commerce-ready presentation suitable for direct publish.

Field testing the Seller App with end-users inside a retail location

Field testing · May 2022

End-user usability session

Watching a real seller perform their first scan on the floor of their own store.

Daniel Manzela working with a retailer during in-store Seller App field testing

Field testing · May 2022

On-floor co-discovery

Co-discovery work alongside the retailer — observing where the camera workflow broke.

Metrics

Traction & KPIs

Bootstrapped to $10K MRR from SMB customers over four years. Instrumented from day one against the HEART framework — see Pivot for why the revenue line stopped scaling and what we did about it.

4 yrs

Bootstrapped, no external capital

~50

Products / hour onboarding target

$10K

MRR at plateau, from SMB customers (see Pivot)

HEART Framework

Five axes wired to the roadmap

Happiness — in-app survey + NPS after onboarding
Engagement — scans / session, products / week
Adoption — first scan, first published product, first sale
Retention — week-2 / week-4 / month-3 cohorts
Task Success — completed-upload rate per session

Cohort, NPS, and task-success figures held under prior NDA. Instrumentation was production-grade — Mixpanel events with a HEART-aligned schema, GA4 for acquisition, and a roadmap that took a backlog item only when a HEART signal moved.

Backlog at deprecation

What customer pain told us next

· Online payment integration
· Last-mile delivery integration
· Point-of-sale (POS) sync
· Weight-based products (deli, produce, bulk)

Each pain item entered the backlog with the user interview attached. Roadmap priority was a function of frequency × revenue impact, not vendor preference. We shipped the pivot instead of these.

Strategic Decision

Why We Pivoted

Field data revealed an incentive problem the product could not engineer around — so we changed the customer.

The friction we couldn’t code away

Retailers and their on-ground employees were not incentivized to perform the setup themselves. Onboarding stalled. Production roll-out kept slipping past the moment the product was ready.

The discovery

Big-box retailers — operating eCom sites without local visibility — turned out to be the right buyer for a frictionless, automated, large-scale eCom-per-location setup. The need was upstream of the seller.

The Covid-19 window

The pandemic was the right macro window for digital transformation in retail — and it was precisely the window that exposed the human-on-the-ground friction at scale.

The outcome

Pivoted from SMB self-service to enterprise frictionless solutions. That move directly seeded the Autonomous Content Pipeline — and the open-source spinout — same insight, different blast radius.

Two-stage wind-down. April 2024 — active Seller App development ended; team rolled to enterprise. The Computer Vision stack and the Global Product Database carried forward as feedstock for the next product generation.

Reflection

What I’d Do Differently

Three concrete things, named honestly. Not platitudes — the operational detail senior recruiters and operators recognize.

L · 01

Confused a sales / ops problem for a UX problem.

Seller incentive friction looked like “the onboarding flow is too hard” for nine months longer than it should have. The fix was not another camera generation — it was a field-ops person who could sit next to the seller and earn the first 50 SKUs. I’d hire that person before shipping V2.

L · 02

Made the right architectural bet for the wrong reason.

Building the GPD as a retrieval layer (not as model fine-tuning data) felt like defensive engineering at the time. It turned out to be the only piece of the stack that survived the pivot intact and is the load-bearing pattern of every product since. I’d commit to retrieval-grounded CV from day one on the next zero-to-one.

L · 03

Let the HEART framework decide too late.

We instrumented the funnel correctly from week one and then let revenue conversations override what HEART signals were saying for a full quarter. When the retention cohort and the MRR line disagree, the cohort is right earlier. I’d put a forcing function on that — a monthly review where the cohort number has veto power over the roadmap.

Computer Vision for Retail Digitization.