Automated Lead Scoring and Outreach Engine

A production-grade system that automatically enriches, scores, routes, and generates outreach for companies in a live HubSpot CRM. Built on real company data (health systems) across 7 interconnected components.

View on GitHub

The core system is fully live against a real HubSpot portal with real company data. The CRM connector, territory router, composite scorer, ML model, and dashboard all run against production CRM records.

Two areas are partially stubbed.

Lead enrichment is fully built but without a People Data Labs account. Without a valid API key, companies get marked as failed and the pipeline moves on gracefully. In production you would need to swap in the key and for enrichment to populate automatically. The integration point is already there.

The pain signal detector uses mocked news data — realistic healthcare articles mapped to known companies — rather than a live news feed. Everything downstream of that is real: the Claude classification, the gating logic, the outreach generation all run against real CRM fields. In production you'd connect a live news API or an intent data provider like Bombora at the same integration point.

The design philosophy was to build every integration point as if it were production and stub only the data sources that require funded third-party accounts to demo.

If I were taking this or something similar live I would also have included a feedback loop that retrains the scoring model on closed-won data over time, and a sequence layer that tracks outreach responses and adjusts follow-up cadence automatically.

Diagram of HubSpot CRM data flowing through enrichment, ML scoring, composite scoring, territory routing, AI outreach, and the intelligence dashboard

I built a GTM automation system that takes a company from "just entered our CRM" to "personalized outreach in the rep's queue" without anyone touching it manually.

It starts with a live connection to HubSpot that keeps our data fresh. The moment a company enters the system, it automatically gets enriched — we pull in firmographic data like headcount, revenue, industry, and tech stack from an external API so we're not working with a half-empty record.

From there, a machine learning model scores each company on ICP fit — basically asking "how closely does this company match our ideal customer?" — and returns a score from 0 to 100. That score gets combined with two other signals: how much the company has engaged with us, and whether they've shown signs of having the problem we solve. The result is a single composite priority score that determines which tier each company falls into. Everything writes back to HubSpot automatically.

When a new company is created, a webhook fires and routes it to the right rep instantly based on geography and company size. No spreadsheet, no Slack message, no manual assignment.

The AI layer sits on top of all of this. It reads each company's scores and signals, classifies ICP fit, detects intent, and generates personalized outreach for the BDR team using Claude. The output isn't a template — it's informed by what we actually know about each company.

A Streamlit dashboard ties it together — pipeline health, scoring breakdowns, routing audit trails — so the team always knows what the system is doing and why.

The whole thing is designed around one principle: get the right company in front of the right rep with the right message before a human has to think about it.

Key Design Decisions

Firmographic data drives half the score — and that's intentional.

In healthcare, certain things are either true or they're not. A company either has the headcount, the revenue, and the Medicare participation to be a real customer. No amount of engagement changes that. So I weighted firmographic fit at 50% — it's the floor, not a tiebreaker. Engagement and urgency signals split the other 50% because they tell you when to reach out, not whether to. A perfect-fit company that hasn't engaged yet still surfaces as a priority. A poor-fit company that's been clicking around doesn't jump the queue.

Territory routing follows fixed rules, not load balancing.

I could have built a system that evenly distributes leads across reps automatically. I didn't, and it was a deliberate call. Fixed rules — region and company size determine which rep gets which company — mean every routing decision is explainable. You can look at any company in the CRM and know exactly why it went where it went without digging through server logs. The tradeoff is that if territories get unbalanced, someone has to update the rules manually rather than letting the system self-correct. For a sales team where rep accountability and transparency matter, that's worth it.

HubSpot is the only source of truth.

Every component in the system reads from HubSpot and writes back to HubSpot — nothing passes data to the next stage through local files or a shared database. This means if one part of the pipeline goes down, the rest keep running. It means every intermediate result shows up in CRM in real time, not just at the end. And it means anyone on the team can see exactly what the system did and when. The cost is more API calls and more attention to rate limits. The benefit is a system that produces real, visible CRM changes at every step — which matters when you're asking a sales team to trust something they didn't build.