The dbt Layer: How I Model a Portfolio from Transactions Up
The hardest part of building a portfolio analytics system isn’t getting prices. It’s accurately reconstructing what you actually held on any given day, and computing returns from that.
Here’s how I do it in dbt.
The core problem
A transaction ledger tells you events: “on this date, I bought X shares of Y at price Z.” To compute portfolio performance, you need states: “on every day from then until now, I held N shares of Y.”
Reconstructing that state from events is the job of int_daily_holdings.
The staging layer
Three staging models clean and type the raw sources:
stg_raw_prices — Cleans the yFinance data. Handles null close prices (which appear on market holidays in the raw feed), casts dates properly, and renames columns to a consistent schema.
stg_fred_macro — Cleans FRED indicator data. The FRED API returns data in a long format (series_id, date, value) which I keep as-is here and pivot later.
stg_transactions — Cleans user transaction records. Normalises tickers to uppercase, validates that quantity and price are positive, and joins in the user’s ID.
int_daily_holdings — the core model
This model generates one row per (user, ticker, date) for every day between a user’s first transaction and today.
The logic:
- For each (user, ticker), find all BUY and SELL events sorted by date.
- Generate a spine of all trading days using the prices table.
- For each day, compute the cumulative net position: sum of BUY quantities minus sum of SELL quantities for all transactions on or before that date.
- Filter out rows where net position is zero (before any buy, or after a full sell).
The spine-based approach is important. I use stg_raw_prices to generate the date spine rather than a calendar table, because I only need market days and the prices table already filters to those.
fct_daily_returns
Joins int_daily_holdings against stg_raw_prices to produce:
- Market value of each holding on each day (quantity × close price)
- Daily return (today’s value / yesterday’s value - 1)
- Cumulative return from first purchase
This is the main fact table that powers the portfolio time series charts.
The gold layer
Four mart models sit on top:
gold_p_timeseries— Aggregatesfct_daily_returnsto portfolio level (summed across all tickers per user per day). The time series that drives the main chart.gold__performance— Scalar KPIs per user: total return, annualised return, max drawdown, Sharpe ratio (approximated).gold_ticker_kpis— Per-ticker breakdown: holding period, cost basis, current value, unrealised P&L.gold_macro_context— Joinsfct_macro_regimesto the portfolio time series so the frontend can shade the chart by macro regime.
What I’d do differently
The date spine approach works but it means int_daily_holdings is a large model — one row per user per ticker per trading day. For two users with 10 tickers over 5 years, that’s already ~25,000 rows. It scales fine in Postgres for personal use, but if I ever want to support hundreds of users I’ll need to think about incremental materialisation more carefully.
dbt’s incremental models would help here. I’ve kept everything as table for now because it keeps the logic simple and the data volumes are small. I’ll revisit this when it becomes a problem.
Macro regime classification
The fct_macro_regimes model deserves its own post. The short version: I use a set of FRED indicators (yield curve slope, unemployment trend, industrial production) to classify each month into one of four regime buckets. These get joined onto the portfolio time series so you can see how your portfolio performed in different macro environments. It’s rough but surprisingly useful context.