From Open Data
to Ranked Prospects.

A bespoke five-stage scraping, structuring, and classification pipeline — built from scratch for a market existing data platforms didn’t serve.

Daniel Manzela · Israeli Financial Services Firm (wealth-management focus) · Feb 2019 — Jul 2020

17 mo

In Production

₪50M+

New AUM Generated

Solo

Engineer

Pipeline Stages

The Problem

No Vendor for the Job.

A small Israeli financial services firm needed targeted prospects. The available enterprise data platforms didn’t serve the market — wrong language, wrong geography, wrong price point.

Generic Lists, Low Conversion

Sales worked from purchased lead lists with no geographic or income filtering. Conversion was poor because the prospect base wasn’t the right one — cold-calling people who had no need for wealth management or pension planning.

No Vendor for This Market

Enterprise data platforms (ZoomInfo, Clearbit, Apollo) were US-focused, priced at enterprise tier, and had thin coverage of Israeli business and professional records. None of them indexed the Hebrew-language sources where the actual prospects lived.

Geographic Targeting by Hand

Filtering prospects by income tier, neighborhood, or postal code — the dimensions that actually predicted whether someone needed financial services — required a custom layer that no off-the-shelf tool offered for this region.

The Build

Five Stages, Manually Orchestrated.

Each stage was a standalone tool with a single job. Nothing was end-to-end automated — I ran, validated, and handed off each component myself. Outputs landed on the sales team’s shared drive in Excel.

Target Definition

Israeli postal codes and neighborhoods classified into income tiers using public demographic and real-estate signals. The geo-target list became the scope for every downstream stage — collapsing the search space before a single page was fetched.

Extraction

WebScraper (open-source) handled local extraction from registries, directories, listings, and municipal sites. BrightData’s Residential Proxy provided rotating IPs and anti-bot bypass for rate-limited and geo-restricted sources.

Structuring

Custom VBA Macro scripts in Excel handled normalization, enrichment, formatting, classification, and dedup — turning raw messy data into a structured, highly filterable index with multi-search and sort capabilities. Every record had a known shape and a unique identity before any score touched it.

Classification

Each record was scored against weighted signals — geo-tier, profession, business-ownership indicator, cross-source coverage — to produce a ranked prospect list. Weights were tuned manually between batches based on conversion-rate feedback from the sales team.

Delivery

The final stage exported themed Excel workbooks — sorted by region, profession, and score band — to the sales team’s shared drive. The team never had to interact with the pipeline; they just opened a file in the tool they already used.

Engineering principle

A pipeline is a series of filters, not a series of steps. Each stage shrinks the input for the next — geo collapses scope before a page is fetched, dedupe collapses records before a score is computed. That discipline is what kept a manually-orchestrated system tractable.

The scraping was the easy part.
The classifier was the product.

Reflection

What This Project Taught

The lessons from a 17-month custom pipeline outlasted the pipeline itself. They recurred, at much larger scale, in everything that came after.

Stack

WebScraperextraction BrightDataproxy · anti-bot Excel + VBAnormalization VBA Macrosclassification Exceloutput · delivery Workstationinfrastructure

Quality over volume. Early batches converted at near-zero. Tighter geo-filtering and stricter scoring thresholds reduced volume and lifted conversion — the same lesson that recurred at Tasko.ai.

Narrow domain, custom pipeline. Enterprise tools generalize at the cost of the specializations that mattered here: language, geography, target profile. A scoped pipeline ran cheaper and ranked better.

The classifier is the product. Anyone can scrape. The value lived in the layer between raw data and a sales team’s decision — and that layer was bought with iteration on real feedback.

A human in the loop doesn’t scale. I was the orchestrator for every handoff between stages. The pipeline worked — but I was the bottleneck. That realization drove the shift toward end-to-end automation in everything that followed.

For seventeen months the sales team worked from leads generated by this pipeline. The converted prospects represented over ₪50M in new asset-management client portfolio. Cost: a workstation and one engineer.

The same pattern — extract, classify, deliver in the format the consumer already uses — recurred at larger scale: Tasko.ai and the Pipeline Observatory. The earliest version of that thinking lived here.

From Open Data to Ranked Prospects.