Overview

I’m Vishal Kerai. I joined QVC in January 2024 as a Senior Commercial Analyst — but a natural affinity for solving problems with code led me down a different path. I became QVC International’s first analytics engineer, designing and building a modern data stack from scratch. This portfolio documents what I built and the thinking behind it.

The Environment I Inherited

QVC is a company rich in data but constrained by legacy infrastructure. When I arrived, the analytics platform had been built incrementally over years — before modern tooling was accessible to analysts.

My first project, a customer segmentation dashboard, exposed every weakness. Joining 60 million orderline rows with product and member data took over an hour in Oracle. Transfers were capped at 7 Mbps over VPN. CSV files were scattered across network drives with no unified source of truth. New business insights could take days or weeks to generate.

The broader stack looked like this:

  • Oracle — the transactional source database, not designed for analytical workloads
  • Hyperion — a deprecated OLAP layer scheduling extracts to network drives
  • Excel + Power Query — the analytics platform. Analysts connected to Oracle via ODBC, transformed in Power Query, and built pivots. Business logic lived inside spreadsheets
  • 16GB laptops — where it all ran, often from home over 7 Mbps VPN connections

It worked. People delivered insights, built dashboards, ran the business. But no one was bridging the gap between analyst work and modern engineering practices — multiple analysts were creating different versions of similar reports, nothing was version-controlled, and there was no path to scale.

Why I Built Something Different

When I hit the limitations — Excel row limits, day-long rebuilds, queries that crashed my laptop — I reached for what I knew: Python and SQL. Six months of exploration led me to dbt, DuckDB, and a local-first approach that worked within existing infrastructure rather than waiting for cloud approvals.

The full story is in The Journey.

What I Built

A modern analytics platform that consolidates data from Oracle, Azure Blob Storage, and SharePoint into a unified, version-controlled data warehouse powered by DuckDB and dbt.

Impact

Pipeline runtime: 1 day → 20 minutes. What used to crash my laptop now runs reliably every morning. 100+ dbt models serve 4 international markets (UK, DE, IT, JP), enabling ~50 stakeholders across the business to self-serve analytics through Tableau.

How People Use It

The platform isn’t just infrastructure — it’s how teams across QVC International get their data. Analysts build reports on top of it. Planners maintain their own classification prompts. Cross-market reporting that used to require weeks of manual alignment now runs on shared definitions and common models.

I trained the first peer analyst on the stack (Git, dbt, Python), and he went on to build reporting independently. Fragile institutional knowledge — like bespoke deduplication logic that lived in one person’s head — has been formalised into version-controlled, testable transformations.

More on this in The Journey.

Architecture at a Glance

flowchart LR
    subgraph Sources
        Azure[Azure Blob]
        Oracle[(Oracle)]
        SharePoint[SharePoint]
    end

    subgraph Ingestion
        Python[Python Pipelines]
    end

    subgraph Storage
        Parquet[Parquet Lake]
        DuckDB[(DuckDB)]
    end

    subgraph Transform
        DBT[dbt Models]
    end

    subgraph Serve
        Tableau[Tableau Server]
    end

    subgraph Orchestrate
        Dagster{Dagster}
    end

    Azure --> Python
    Oracle --> Python
    SharePoint --> Python
    Python --> Parquet
    Parquet --> DuckDB
    DuckDB --> DBT
    DBT --> Tableau
    Dagster -.-> Python
    Dagster -.-> DBT
    Dagster -.-> Tableau

Sources:

  • Oracle — Legacy transactional database. Sales, inventory, customer data. Slow to query directly for large datasets.
  • Azure Blob — Files dumped by data engineering. CSVs, Excel exports, and the Parquet Lake.
  • SharePoint — Business-managed spreadsheets. Pricing, promotions, manual overrides.

📚 The Stack

Technologies powering the platform. Browse all →

TechnologyPurpose
SourcesWhere the data comes from (Oracle, Azure, SharePoint)
Parquet LakeExtending the existing data lake for analyst access
DuckDBA constraint-driven database choice that enabled everything
dbtTransformation layer (medallion architecture)
dbt-duckdbTechnical patterns and SQL reference
DagsterOrchestration and scheduling
Python PipelinesIngestion from Azure, Oracle, SharePoint
TableauEnabling self-serve analytics for the business

🛤️ The Journey

How this platform evolved and what I learned. Browse all →

TopicDescription
Case StudiesReal problems solved (STAR format)
ChallengesTechnical obstacles overcome
AI-Accelerated DevHow I used Claude Code

🔧 How-To Guides

Practical patterns for common data problems. Browse all →

GuideProblem It Solves
Automating Tableau ExtractsPublish data to Tableau Server without opening Desktop
Building a Transformation LayerStructure transformations so you’re not starting from scratch
Working with Large Data LocallyHandle datasets that push against your laptop’s RAM limits

These guides share what I’ve learned. They’re based on Python/dbt, but the concepts often transfer to other tools.