Semi-Structured Data Loading and Analyzing in Snowflake

Debi Prasad Mishra
4 min readSep 24, 2021

Introduction: Semi-structured data can be loaded into Snowflake in the form of relational tables without requiring a definition of a schema in advance. And the information will be stored directly into a columns of type VARIANT. JSON, Avro, ORC, and Parquet are the supported semi-structured data formats. And XML is also available as a preview feature. This article provides a step by step example for loading JSON data in Snowflake.

What is JSON: Java Script Object Notation is a light weight data interchange format. Very easy for humans to read and write. Makes easy for the machines to parse and generate. Based on a subset of the java script language standard. These above properties make JSON an ideal data-interchange language. JSON is built on two structures:

A JSON Example includes Complex Data Structure
  • A collection of name/value pairs. This is same as an object, record, struct, dictionary, hash table, keyed list, or associative array.
  • An ordered list of values. This is same as an array, vector, list, or sequence.

Schema Definition: Here we have used below following objects in Snowflake to get a better understanding of semi-structured data load. Such as data base, schema and tables to store our JSON data. I have used sensor_raw as a staging or landing table to hold the information that we have received from source.

File Format: To load the JSON object into Snowflake, the File Format is one of the important objects in the process. That describes a set of staged data to load or access into Snowflake. it must be unique for the schema in which the file format is created.

File Staging: In order to copy the data into Snowflake, we need some spot for the data files to be landed in. Snowflake provides two types of stages — i.e. Snowflake Internal stage or External stages(AWS, Azure, GCP). If you do not have any cloud platform, Snowflake provides space to store data into its cloud environment called — “Snowflake Internal stage”. In this article, we have used a Snowflake internal stage and created a dedicated stage for semi-structured data load demonstration. Please run the below query as mentioned.

Our source JSON structure will look like as mentioned below:

*Source: sensor_data.json

We are going to use PUT and COPY command here. First place the source file in the local directory. To place the file into internal stage have created earlier, from local system, need Snow SQL (CLI Client) via SQL statements. This task can’t be performed over UI interface. I have set up in my personal system. And please follow the instructions as mentioned. Login to CLI client.

Basic set up before executing Put command
To make sure we have Target Structure & Stage Object Created
File Listing in Stage location
File uploading to STAGE location From Local using PUT command
File Listing in Stage location — New File Uploaded

File Loading: In order to load the data from internal stage to table, we will use the copy command as mentioned.

File data loading into SENSOR_RAW Table using Copy command
Table Data validating using Snowflake UI

Data Querying: Snowflake supports SQL queries that access semi-structured data using special operators and the functions. Note that this topic applies to JSON, Avro, ORC, and Parquet data. And the topic does not apply to XML data as per online documentation. Please follow the SQL queries as mentioned.

1- Querying all data elements (refer *Source)
2- Extracting data From Event Array (refer *Source)
3- Extracting data From X3 Array (refer *Source)
4- Combined SQL to Extract Data From JSON (refer *Source)

Conclusion: Thank you for reading, I hope this blog will help you getting the basic understanding of Semi-structured data loading, data manipulation and interaction in Snowflake. Same approach can be followed for other supported data formats and can be referred to online documentation for any resolutions. Finally you can reach out to me in case of more questions you may have, on my twitter handle or my LinkedIn or leave a comment below. Good luck!

References: Please click on the hyper linked words/ terms for more technical details in Snowflake online documentation.

--

--

Debi Prasad Mishra

Engineer by Profession, Student by Passion. Working as a Senior Business Intelligence Engineer. Snowflake Data Superhero 2022-23