Implement SCD-2 ETL Data Pipelines in Snowflake using Streams & Task-Part 1

Debi Prasad Mishra
4 min readSep 30, 2021

--

Introduction: This is Part 1 explains how to build a type 2 Slowly Changing Dimension using Snowflake’s Stream functionality. Later will explain how to automate the process using Snowflake’s Task functionality. Building an SCD in Snowflake is extremely easy using the Streams and Tasks components that Snowflake has already provided. Would like to advice, please go through my previous post implementing SCD-2 Pipelines using IICS & Merge statements explained separately for the basic understanding.

Streams & Task: Stream is a Snowflake object the provides the change data capture(CDC) capabilities to track the delta of changes in a table, it includes such as insert, update & delete operation, so any action can be taken using the changed data. Similarly, Task is a Snowflake object that defines a recurring schedule for executing SQL statements, including statements that call stored procedures. In other words, the task can be used for the orchestration facility.

Source & Target Structure: For this post we would use our source table i.e. Customer table which is very simple in nature and easy to understand. Also, we are using a target table i.e. Customer history table to keep track of history. Metadata definition has been given below.

Image — 1: Metadata details

Stream Creation: Once above steps are completed, lets move on. We are going to use “Customer Table Change” as Stream object. Execute the create command, then show command. GUI shows the details of the stream, which we just created. Following screenshot is for reference only.

Image — 2: Stream creation
Image — 3: Run the command to make sure Stream has been created successfully
Image — 4: Stream table represents the metadata definition helps to track the DML operations

Additionally, there are three new columns you can see in screenshot. Those are used to find out the type of DML operations changed in the source table: METADATA$ACTION, METADATA$ISUPDATE, and METADATA$ROW_ID.

User-defined View: We are going to create a view as Customer change data view that handles the logic to find out what needs to be loaded into customer history table. I have defined the code in such a way once we make a join with Stream table, our Merge SQL code will be very small and efficient. View is a Union of three Select statements having separate logic defined for each step.

Image — 5: View handles INS Operation
Image — 6: View handles UPD Operation
Image — 7: View handles DEL Operation

Merge Statement: Now it’s time for Merge statement design. Let’s have a look on the code. Please follow the command as mentioned below.

Image — 8: Merge Statement

Now it’s time for execution of each steps in a sequential manner. Please stay with me on the demonstration steps including the orchestration process using the Task component in Part 2 section, have explained in details.

Conclusion: Thank you for reading, I hope this blog will help you getting the basic understanding of Snowflake ETL process using its own component. You can reach out to me in case of more questions you have, on my twitter handle or my LinkedIn or leave a comment below. Good luck!

References: Link to Snowflake online documentation

--

--

Debi Prasad Mishra
Debi Prasad Mishra

Written by Debi Prasad Mishra

Engineer by Profession, Student by Passion. Working as a Senior Business Intelligence Engineer. Snowflake Data Superhero 2022-23

Responses (1)