QVC Modern Data Stack

Overview

I’m Vishal Kerai. I joined QVC in January 2024 as a Senior Commercial Analyst — but a natural affinity for solving problems with code led me down a different path. I became QVC International’s first analytics engineer, designing and building a modern data stack from scratch. This portfolio documents what I built and the thinking behind it.

The Environment I Inherited

QVC is a company rich in data but constrained by legacy infrastructure. When I arrived, the analytics platform had been built incrementally over years — before modern tooling was accessible to analysts.

My first project, a customer segmentation dashboard, exposed every weakness. Joining 60 million orderline rows with product and member data took over an hour in Oracle. Transfers were capped at 7 Mbps over VPN. CSV files were scattered across network drives with no unified source of truth. New business insights could take days or weeks to generate.

The broader stack looked like this:

Oracle — the transactional source database, not designed for analytical workloads
Hyperion — a deprecated OLAP layer scheduling extracts to network drives
Excel + Power Query — the analytics platform. Analysts connected to Oracle via ODBC, transformed in Power Query, and built pivots. Business logic lived inside spreadsheets
16GB laptops — where it all ran, often from home over 7 Mbps VPN connections

It worked. People delivered insights, built dashboards, ran the business. But no one was bridging the gap between analyst work and modern engineering practices — multiple analysts were creating different versions of similar reports, nothing was version-controlled, and there was no path to scale.

Why I Built Something Different

When I hit the limitations — Excel row limits, day-long rebuilds, queries that crashed my laptop — I reached for what I knew: Python and SQL. Six months of exploration led me to dbt, DuckDB, and a local-first approach that worked within existing infrastructure rather than waiting for cloud approvals.

The full story is in The Journey.

What I Built

A modern analytics platform that consolidates data from Oracle, Azure Blob Storage, and SharePoint into a unified, version-controlled data warehouse powered by DuckDB and dbt.

Impact

Pipeline runtime: 1 day → 20 minutes. What used to crash my laptop now runs reliably every morning. 100+ dbt models serve 4 international markets (UK, DE, IT, JP), enabling ~50 stakeholders across the business to self-serve analytics through Tableau.

How People Use It

The platform isn’t just infrastructure — it’s how teams across QVC International get their data. Analysts build reports on top of it. Planners maintain their own classification prompts. Cross-market reporting that used to require weeks of manual alignment now runs on shared definitions and common models.

I trained the first peer analyst on the stack (Git, dbt, Python), and he went on to build reporting independently. Fragile institutional knowledge — like bespoke deduplication logic that lived in one person’s head — has been formalised into version-controlled, testable transformations.

Architecture at a Glance

flowchart LR
    subgraph Sources
        Azure[Azure Blob]
        Oracle[(Oracle)]
        SharePoint[SharePoint]
    end

    subgraph Ingestion
        Python[Python Pipelines]
    end

    subgraph Storage
        Parquet[Parquet Lake]
        DuckDB[(DuckDB)]
    end

    subgraph Transform
        DBT[dbt Models]
    end

    subgraph Serve
        Tableau[Tableau Server]
    end

    subgraph Orchestrate
        Dagster{Dagster}
    end

    Azure --> Python
    Oracle --> Python
    SharePoint --> Python
    Python --> Parquet
    Parquet --> DuckDB
    DuckDB --> DBT
    DBT --> Tableau
    Dagster -.-> Python
    Dagster -.-> DBT
    Dagster -.-> Tableau

Sources:

Oracle — Legacy transactional database. Sales, inventory, customer data. Slow to query directly for large datasets.
Azure Blob — Files dumped by data engineering. CSVs, Excel exports, and the Parquet Lake.
SharePoint — Business-managed spreadsheets. Pricing, promotions, manual overrides.

📚 The Stack

Technologies powering the platform. Browse all →

Technology	Purpose
Sources	Where the data comes from (Oracle, Azure, SharePoint)
Parquet Lake	Extending the existing data lake for analyst access
DuckDB	A constraint-driven database choice that enabled everything
dbt	Transformation layer (medallion architecture)
dbt-duckdb	Technical patterns and SQL reference
Dagster	Orchestration and scheduling
Python Pipelines	Ingestion from Azure, Oracle, SharePoint
Tableau	Enabling self-serve analytics for the business

🛤️ The Journey

How this platform evolved and what I learned. Browse all →

Topic	Description
Case Studies	Real problems solved (STAR format)
Challenges	Technical obstacles overcome
AI-Accelerated Dev	How I used Claude Code

🔧 How-To Guides

Practical patterns for common data problems. Browse all →

Guide	Problem It Solves
Automating Tableau Extracts	Publish data to Tableau Server without opening Desktop
Building a Transformation Layer	Structure transformations so you’re not starting from scratch
Working with Large Data Locally	Handle datasets that push against your laptop’s RAM limits

These guides share what I’ve learned. They’re based on Python/dbt, but the concepts often transfer to other tools.

QVC MDS

Explorer

QVC Modern Data Stack

Overview

The Environment I Inherited

Why I Built Something Different

What I Built

How People Use It

Architecture at a Glance

📚 The Stack

🛤️ The Journey

🔧 How-To Guides

Graph View

Table of Contents