The Journey

Overview

This wasn’t a planned project with a budget and a team. It was a grassroots effort born from identifying gaps in existing tooling and a desire to do meaningful analytical work. Here’s how it evolved.

Era 1: The Analyst Phase

January 2024 – June 2024

Joined QVC UK as a Senior Commercial Analyst. Three months in, I was assigned the Customer Segmentation Dashboard — a standard dashboard build in theory, but this one would serve the entire UK market. A lot of eyes were on it, which raised the stakes.

The Landscape

Data lived in: Oracle, Hyperion, scattered Excel files
Hardware: 16GB RAM laptop, often working from home over VPN
My goal: Prove myself as an analyst, deliver the dashboard

The Project: Customer Segmentation Dashboard

A top-down mandate: classify customers into 5 segments and make this data available to the whole UK business.

The awkward truth? 80% of customers fell into just 2 of the 5 segments. The segmentation itself wasn’t particularly useful. But the project was necessary — it was my job to make it work.

Key Decision: Skip Hyperion, Use Tableau

Hyperion was the “approved” tool for extracts. On paper, it made sense: a drag-and-drop interface for building queries, the ability to schedule extracts, and direct access to Oracle. In practice, it wasn’t suited to my needs.

The problems with Hyperion were fundamental:

No incremental refreshes. Every scheduled extract runs the full query from scratch. Need daily data? That’s a full query hitting Oracle every single day, increasing load on an already slow system.
Locked to the data dump folder. Hyperion can only save extracts to a specific network folder. I needed data in SharePoint so Tableau could access it. Hyperion couldn’t do that.
No flexibility for joins. I needed to enrich order data with customer segments, product attributes, and demographic information. Hyperion made this awkward at best, impossible at worst.
End of life. Support ended years ago. The interface crashed regularly. Work got lost without warning. Investing time in mastering it felt like investing in legacy technology with limited future.

So I opted out. Instead, I went with Tableau — the business-approved BI tool. The standard approach was to connect Tableau directly to the Oracle Data Warehouse using Custom SQL. The query runs on Oracle, and the results feed into Tableau. This works, but it has limitations: making changes to field names or doing analytical cleaning in Tableau is awkward. As a bridge, you could load data into Python, do your transformations, and upload a CSV that Tableau embeds in the workbook.

The Real Problem: Tableau Server Refresh

I needed one big table — two years of order-level data with customer segments attached. About 60 million rows. Not an insane amount, and the computations were simple: summing sales, distinct counts of customers.

The initial build worked fine. Oracle → Tableau Desktop on my laptop was fast enough. I could pull two years of data, publish to Tableau Server, and set up incremental refreshes.

The problem came with full refreshes.

The connection between Oracle and Tableau Server (based in the US) was capped at around 7 Mbps. The customer segmentation dashboard needed a full refresh every quarter due to how the segments were calculated. These refreshes would timeout. They’d fail. They’d break.

My workaround: rebuild the full dataset locally on my laptop every week or month, then publish. This took half a day to a full day. I used Dask to process the data out-of-memory because 16GB wasn’t enough to hold it all in Pandas.

The Gap

I knew the one-big-table approach could work. Tableau could handle 60 million rows for simple aggregations. But the infrastructure around it — the Oracle-to-Server connection speeds, the lack of an intermediate layer — didn’t support this pattern.

This gap — between what was technically possible and what the existing infrastructure supported — became the motivation for the MDS.

What I Learned

The pattern worked: extract → transform → serve
But it was manual, fragile, and took a full day to rebuild
Hardware and network were real constraints, not excuses
Seed planted: “There has to be a better way”

Era 2: The Wilderness

July 2024 – December 2024

The Problem

I wanted to be an impactful analyst. But getting the data I needed required significant manual effort each time. The tooling was the bottleneck, not my analysis skills.

By this point, I had a working solution for the Customer Segmentation Dashboard: Python scripts that pulled data from Oracle, performed the joins and transformations, and output CSVs to OneDrive for Tableau to consume. I even set up Prefect on a virtual machine to schedule these scripts automatically.

It worked. But it wasn’t built for long-term maintainability.

Every time I needed a new dataset, I faced the same problem: where do I get the data, how do I transform it, and how do I schedule the refresh? There was no reusable infrastructure. Every project started from scratch. I was spending most of my time wrestling with data plumbing instead of doing actual analysis.

The Organisational Context

The legacy stack worked. It had served the business for years, and people had built real expertise around it. But it wasn’t designed for the kind of analysis I wanted to do — large datasets, frequent refreshes, version-controlled transformations.

A cloud migration was in progress, but these things take time. Azure existed somewhere in the organisation, but it wasn’t yet accessible for ad-hoc analytics work. Oracle remained the source of truth, and the tooling around it was what we had to work with.

I raised the constraints with management. The response was reasonable: the cloud project was moving forward, and in the meantime we should work with what we had.

The Search

So I started looking for alternatives. Tools I could run myself, without waiting for IT approval.

dbt — “Wait, you can version control SQL? And it generates documentation automatically?”
DuckDB — “A database that just… runs? No server? No installation? Just a file?”
BigQuery — “This would be perfect… but IT approval would take months.”
Prefect — Already using it for scheduling, but it felt like overkill for what should be simple.

I explored what was available. Azure Synapse existed but wasn’t accessible for my use case. Snowflake and BigQuery would have required approval processes I couldn’t shortcut. I read about data mesh, medallion architecture, modern analytics engineering.

Mostly, I read. And learned. And waited for something to click.

The Turning Point

At some point, I stopped waiting for the infrastructure to arrive and started asking: what can I build with what I have?

The constraints were real — limited RAM, no cloud access, legacy tooling. But constraints can be clarifying. If I couldn’t get approval for BigQuery, maybe I didn’t need BigQuery. If I couldn’t run a server, maybe I didn’t need a server.

What I Learned

Enterprise IT moves at its own pace — Approval chains exist for good reasons (security, compliance, support). But they also mean you can’t always wait for the ideal solution.
Cloud access is an organisational problem, not a technical one — Getting BigQuery or Snowflake wasn’t about capability. It was about navigating procurement, security reviews, and budget approval. That takes time and seniority I didn’t have.
Local-first tools offer autonomy — If it runs on my laptop, I can start building today. This realisation would prove crucial.
This era felt slow, but I was building knowledge — I didn’t ship much. But I learned what modern data infrastructure looked like, even if I couldn’t build it yet.

Era 3: The Acceleration

January 2025 – Present

Everything clicked. dbt + DuckDB inspired me to actually build something.

The Moment It Clicked

I’d been reading about dbt and DuckDB for months. But in January 2025, I stopped reading and started building.

The first real model I built? int_orderline_attribute.

This was the dbt model that replaced my painful Era 1 pipeline. Same purpose — order-level data with product attributes for the Customer Segmentation Dashboard. But instead of a day of Python crashes, it ran in minutes. And it was version-controlled. And documented. And testable.

Era 1: Oracle → CSV → Python (crashes) → CSV → Tableau
       Time: 1 day. Reproducible: barely.

Era 3: Oracle → Parquet → dbt model → DuckDB → Tableau
       Time: ~20 minutes. Reproducible: always.

AI-Accelerated Development

This is also when I discovered Claude Code. The combination of AI assistance and modern tooling meant I could build faster than I ever had before. Not by cutting corners, but by removing friction.

The Stack Crystallised

dbt for transformations and lineage
DuckDB for fast analytical queries (no server needed)
Python for ingestion pipelines
Git for version control
Dagster for orchestration

See case-studies for real examples of problems solved with this stack.

Progress

100+ dbt models serving 4 international markets
50+ Dagster assets orchestrating the full pipeline
Automated daily refresh enabling self-serve analytics for ~50 stakeholders
Data products published to Tableau Server before the business day starts

What Changed

The difference wasn’t just tools — it was having a system.

Before: Scattered SQL files, undocumented Python scripts, tribal knowledge. After: One repo, version controlled, self-documenting, onboardable.

Along the way, I solved a number of technical challenges — DuckDB concurrency, European CSV dialects, safe file operations.

Expanding the Scope

I built the MDS mostly to serve myself — to do the work I needed to do without fighting the tooling. But as it matured, I started to see room to bring others in.

A peer analyst got curious. Instead of giving him a demo, I paired with him on real work — Git, dbt, Python. He went on to build a customer dashboard and a product classification report independently, using the MDS as his foundation.

What I didn’t expect was how much I got out of it. Every person who onboarded brought business logic that had been locked away in their own siloed way of working — deduplication rules living in text files, metric definitions embedded in Excel, domain knowledge that had never been written down as code. Each new addition to the MDS felt like a genuine win. They learned more efficient ways of working. I got their institutional knowledge formalised into the shared codebase.

The core model feeding all UK commercial reports needed to be ready by 8:30 AM. That constraint reshaped my orchestration design. A colleague’s bespoke SQL for customer matching became a version-controlled dbt model anyone could maintain. New reporting requests went from weeks to days because the infrastructure already existed.

Technical people often assume the work speaks for itself. It doesn’t. I learned that the hard way — pitching the architecture got polite nods, but showing live dashboards and automated pipelines got people interested. The platform only became valuable when other people could build on it without me.

Era 4: The Realisation

Present

Every good project needs an exit strategy. But sometimes the exit isn’t a handover — it’s an honest assessment of your own position.

The Honest Truth

I built a modern data stack. It works. It’s faster, cleaner, more maintainable than what existed before.

But driving platform-level change requires more than a working solution. It requires sustained advocacy, the right timing, and enough political capital to shepherd something through an organisation. As an individual contributor, I only have so much influence — and there are competing priorities, other ways of working, and market pressures that reasonably take precedence.

What I’ve Learned About Change

Technical solutions don’t adopt themselves. Organisational change requires:

The right timing and business context
Someone with the positional authority to champion it
Bandwidth from teams who would need to adopt new workflows
Alignment with broader strategic priorities

None of this is a criticism — it’s just reality. People have different learning preferences, different comfort levels with new tooling, and different views on what “good enough” looks like. That’s normal. My job was to build something that worked; whether it gets adopted is a separate question that depends on factors beyond my control.

What This Project Becomes

This portfolio. Proof that I can:

Identify infrastructure gaps and design solutions around real constraints
Build production data products from scratch — version-controlled, tested, documented
Enable self-serve analytics for cross-functional teams
Apply software engineering best practices to data work

The learning compounds regardless of adoption — and it travels with you.

The Lesson

The best infrastructure often comes from practitioners who felt a problem acutely and built a solution.

But building something isn’t the same as getting it adopted. Influence and timing matter as much as technical quality — and those aren’t always within your control.

If you’re in Era 2 right now — exploring options, trying things that don’t work — keep going. The learning compounds, even if adoption doesn’t follow immediately.

And if you build something great that doesn’t get picked up? That’s not failure. That’s proof of capability — and it travels with you.

QVC MDS

Explorer

The Journey

Overview

Era 1: The Analyst Phase

The Landscape

The Project: Customer Segmentation Dashboard

Key Decision: Skip Hyperion, Use Tableau

The Real Problem: Tableau Server Refresh

The Gap

What I Learned

Era 2: The Wilderness

The Problem

The Organisational Context

The Search

The Turning Point

What I Learned

Era 3: The Acceleration

The Moment It Clicked

AI-Accelerated Development

The Stack Crystallised

Progress

What Changed

Expanding the Scope

Era 4: The Realisation

The Honest Truth

What I’ve Learned About Change

What This Project Becomes

The Lesson

AI-Accelerated Development

Case Studies

Challenges

Table of Contents