Skip to main content

Documentation Index

Fetch the complete documentation index at: https://lightdash-docs-okta-cloud-sso-setup.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Defining pre-aggregates

Pre-aggregates are defined under the pre_aggregates key in your model configuration. If you’re using Lightdash YAML instead of dbt model YAML, see the Lightdash YAML syntax guide for the surrounding model structure.
models:
  - name: orders
    meta:
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day

Configuration reference

PropertyRequiredDescription
nameYesUnique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores.
dimensionsYesList of dimension names to include. Must contain at least one dimension.
metricsYesList of metric names to include. Must contain at least one metric.
filtersNoStatic filters applied when materializing the pre-aggregate. Matching queries must include an equivalent or narrower filter to use this pre-aggregate.
time_dimensionNoA time-based dimension for date grouping. Must be paired with granularity.
granularityNoTime granularity for the time_dimension. Valid values: hour, day, week, month, quarter, year. Must be paired with time_dimension.
max_rowsNoMaximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer.
refreshNoSchedule configuration for automatic re-materialization. See Scheduling refreshes.
materialization_roleNoFixed access context to use when materializing the pre-aggregate. This is useful when your model or joined tables use required_attributes or any_attributes. See Materialization role.
If you specify time_dimension, you must also specify granularity, and vice versa.

Filtered pre-aggregates

Use filters when you want a pre-aggregate to materialize only a subset of the source data. For example, this pre-aggregate only stores data for the last 52 weeks:
models:
  - name: orders
    meta:
      pre_aggregates:
        - name: recent_orders_daily
          dimensions:
            - status
          metrics:
            - total_order_amount
            - order_count
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day
This is useful when a rolling time window is queried frequently and deserves its own smaller materialization.

How query matching works with filters

Filtered pre-aggregates are only used when the query filters are compatible with the pre-aggregate definition:
  • A query with the same or narrower filter can use the pre-aggregate
  • A query without the filter, or with a broader or incompatible filter, falls back to another pre-aggregate or the warehouse
For the example above:
  • order_date inThePast 12 weeks can use the pre-aggregate
  • order_date inThePast 52 weeks can use the pre-aggregate
  • order_date inThePast 104 weeks cannot use the pre-aggregate
  • no order_date filter: cannot use the pre-aggregate
If a field is only used for filtering, you should still include it in the pre-aggregate’s dimensions list so Lightdash can match and re-aggregate queries correctly.

Multiple pre-aggregates per model

You can define multiple pre-aggregates on the same model, each targeting different query patterns. It is better to have multiple small, focused pre-aggregates rather than a single one containing all metrics and dimensions. Including too many dimensions increases the number of unique combinations, which generates large materialization files — this defeats the purpose of pre-aggregates, since they are meant to be smaller and faster than querying the warehouse directly. For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views:
models:
  - name: orders
    config:
      meta:
        pre_aggregates:
          - name: orders_daily_by_status
            dimensions:
              - status
            metrics:
              - total_order_amount
              - order_count
            time_dimension: order_date
            granularity: day
          - name: orders_monthly_summary
            dimensions:
              - status
            metrics:
              - total_order_amount
            time_dimension: order_date
            granularity: month
            max_rows: 1000000
When a query matches multiple pre-aggregates, Lightdash picks the smallest one.

Scheduling refreshes

By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project’s configured timezone (defaults to UTC):
pre_aggregates:
  - name: orders_daily_by_status
    dimensions:
      - status
    metrics:
      - total_order_amount
    time_dimension: order_date
    granularity: day
    refresh:
      cron: "0 6 * * *"  # Every day at 6:00 AM UTC

Materialization triggers

Pre-aggregates can be materialized through four different triggers:
TriggerWhen it happens
CompileAutomatically when your dbt project is compiled
CronOn the schedule you define in refresh.cron
ManualWhen you trigger a refresh from the Lightdash UI

Row limits

You can set max_rows to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated.
When max_rows is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the “max rows applied” warning in the monitoring UI.

Materialization role

materialization_role is useful when access to the model depends on required_attributes or any_attributes. For example, if a joined table is only available to users with region_access: emea, then materializing a pre-aggregate without a fixed access context could produce different results depending on who triggered the build. Use materialization_role to make materialization run with a stable set of user attributes. This is intended for access control fields such as:
models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      pre_aggregates:
        - name: orders_daily_by_region
          dimensions:
            - customers.region
          metrics:
            - total_order_amount
          time_dimension: order_date
          granularity: day
          materialization_role:
            email: materialize@acme.com
            attributes:
              region_access: emea

Complete example

Here’s a full model definition with a pre-aggregate, including joins, scheduling, and row limits:
models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
            - customers.country
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - status: completed
          time_dimension: order_date
          granularity: day
          max_rows: 5000000
          refresh:
            cron: "0 6 * * *"
    columns:
      - name: order_date
        meta:
          dimension:
            type: date
      - name: status
        meta:
          dimension:
            type: string
      - name: amount
        meta:
          metrics:
            total_order_amount:
              type: sum
            average_order_size:
              type: average
With this pre-aggregate, the following queries would be served from materialized data:
  • Total order amount by status, grouped by day, week, month, or year
  • Average order size by status, grouped by month
  • Total order amount filtered to completed orders
  • Order amount by customer country, grouped by quarter
These queries would not match and would query the warehouse directly:
  • Queries including count_distinct metrics
  • Queries grouped by a dimension not in the pre-aggregate (for example, customer_id)
  • Queries with hourly granularity (finer than the pre-aggregate’s day)
  • Queries without status = completed or with a broader status filter
  • Queries with Parameters, user attributes inside SQL, or sql_filter
  • Queries with raw SQL table calculations