Data Warehouse Design: Star Schema vs. Snowflake Schema

Data Warehouse Design: Star Schema vs. Snowflake Schema

Woman in glasses
Jennifer Vilches

Star and snowflake schemas are similar at heart: a central fact table surrounded by dimension tables. The difference is in the dimensions themselves. In a star schema each logical dimension is denormalized into one table, while in a snowflake, at least some of the dimensions are normalized.

A snowflake design can be slightly more efficient in terms of database space, especially if the dimensions have many large text fields. However, in general, the size of the fact tables outweighs the size of the dimensions by a large enough factor that I don’t find this argument compelling.

The simplicity of a star schema will suffice in many designs and it definitely has the advantage of fewer joins to build and maintain. However, there are instances that will call for a snowflake design. Some OLAP reporting tools work more efficiently with a snowflake design. The multiple tier joins available in a snowflake design can make aggregation simpler as well. For instance, if the date dimension is snowflaked into day, month, and year and a monthly aggregate is built, that existing month dimension can easily be used by the aggregate table.

Horton Works

Hortonworks is a leading innovator in the industry, creating, distributing and supporting enterprise-ready open data platforms and modern data applications. Our mission is to manage the world’s data. We have a single-minded focus on driving innovation in open source communities such as Apache Hadoop, NiFi, and Spark. We along with our 1600+ partners provide the expertise, training and services that allow our customers to unlock transformational value for their organizations across any line of business. Our connected data platforms powers modern data applications that deliver actionable intelligence from all data: data-in-motion and data-at-rest. We are Powering the Future of Data™.