Snowflake Performance Tuning Checks Towards Query Optimization- Part 1

6 min readSep 15, 2021

Introduction: In general, Snowflake executes the queries quickly and very efficiently by it’s internal mechanism which makes completely different from other vendors in the analytical platform. Doesn’t require any intervention, so often a slowness in the query is a symptom of a mistake in the way query has been written and few external factors. As we all know Snowflake provides the faster query performance without even indexes. Should design the system in such a way that in the extraction step, should aim to minimize the latency of each query i.e. ability to maximize the throughput. And rapidly should able to transform the raw data into business ready format. Finally should able to load the massive volumes of data into the target without any bottlenecks. In this blog I tried my best to put some light on those areas which needs to be taken care of while designing the ETL data pipelines in Snowflake. Let’s move on.

1- Bulk data load: The common method of bulk data load, is transferring data from on-premise to cloud storage. And using the COPY command to load into Snowflake. Before copying the data, Snowflake checks if the file has been loaded already or not, and this leads the very first step to maximize the load performance by partitioning the staged data files to avoid the extra scanning terabytes of files that have already been loaded. The code snippet listed below shows a COPY using a range of options:

2- Split file into multiple chunks: To make use of all the nodes in the cluster as Snowflakes provided multi cluster and multi threading architecture better to split your data into multiple small files rather one large file, to make use of all the nodes in Cluster. Loading a large single file will make only one node at action and the other nodes are ignored. Need to follow the same practice for data unloading as well.

3- Virtual warehouse allocation: Selecting a large warehouse (scaling up) to speed the loading process is not a good idea when loading a large data files. In reality, the scaling up has no performance benefit in this case. Copy statement will open10Gb data file and start loading sequentially using a single thread on one node, then leaving the other servers idle. Unless you have parallel loads using the same virtual warehouse, the above solution is inefficient. As you will pay for the four servers, while using only one. A better approach is to break up the 10Gb file into 100 x 100Mb files to make the use of Snowflake’s automatic parallel execution, runs much faster as compared to past.

4- Latency vs Throughput: To reduce the latency which is maximizing the performance of the queries, is equally important to maximize the throughput is to achieve the greatest amount of work done in shortest possible of time. By increasing the virtual warehouse size, it is possible to reduce the elapsed time from maximum to minimum hours. But this is not the solution for each and every use-case. Here one solution to improve throughput is to scale up to a bigger virtual warehouse to complete the work faster. While it might improve the query performance, but there’s also a greater chance of inefficient use of resources on a larger warehouse.

One other approach is to execute multiple parallel tasks each with a different connection, and each task uses the same virtual warehouse. As the workload increases, the jobs begin to queue as there are insufficient resources available. The Snowflake multi-cluster feature can be configured to automatically create another same-size virtual warehouse, and this continues to take up the load. As the tasks are completed, the above solution automatically scales back down to a single cluster, and once the last task finishes, the last running cluster will be suspended. This is by far the most efficient and optimal way of completing the batch of parallel tasks, and we still have the option of scaling up. Below mentioned SQL code demonstrates to create a multi-cluster warehouse, which will automatically suspend after 60 seconds idle time, having economy scaling policy to favor throughput and saving credits over individual query latency.

5: Select only the required columns- Snowflake follows the columnar data store architecture. Columnar store databases achieve significant gains over by the physically organizing the rows. Storing the data in columns makes it much more efficient to retrieve a small sub-set of columns from the entire table.

6: Dedicated warehouse for workloads- Snowflake automatically caches the data in the warehouse in local disk cache, so place the users querying the same data on the same virtual warehouse. This maximizes the chances that data retrieved to the cache by one user will also be used by others. Will erase this cache once warehouse will suspend. Result Cache is maintained by global services layer, any query executed by any user on the account will be served from the result cache, provided the SQL text is the same. Results are retained for 24 hours only. However, it’s a good practice to separate the workloads by the type of workload. This means running BI queries from marketing users on one warehouse, while running a separate warehouse to support finance users.

7: Maximize cache usage- Per Snowflake internal architecture that it caches data both in the warehouse and the cloud services layer. Should take the steps to maximize the cache usage is a simple method to improve overall the query performance. Dashboard same queries frequently execute on a daily basis or on schedule. Snowflake automatically optimizes these queries by returning results from the results cache with results available for 24 hours after the each query execution. You should be aware, when resumed, the virtual warehouse cache may be clean, which you may lose the performance benefits of caching.

8: Scale Up for large workloads- Snowflake allows for a scale-up option for the warehouse to handle the large workloads in a better way. Very sensible to resize the warehouse to improve the overall query performance. As scaling up adds additional servers, it distributes the workloads among the nodes and it increases the overall warehouse cache size effectively by gaining the power.

9: Scale Out for concurrency- Unlike the scale up option described above, this technique is used to deploy additional clusters of same-sized nodes for concurrency i.e. increasing the numbers of users rather than task size or the complexity. The SQL Snippet above shows the statement needed to deploy a Multi-Cluster Scale-out Architecture. Using this method, instead of deploying a cluster of larger machines, this instructs Snowflake to add additional same size clusters, as needed, up to a fixed limit.

Conclusion: Thank you for reading, I hope this blog will help you getting the basic understanding of Snowflake performance tuning activities, and it’s measures either using Snowflake components or your design approach. You can reach out to me in case of more questions you have, on my twitter handle or my LinkedIn or leave a comment below. Good luck!

References: Link to Snowflake online documentation

COPY INTO <table> — Snowflake Documentation

Managing Regular Data Loads — Snowflake Documentation

Preparing Your Data Files — Snowflake Documentation

Multi-cluster Warehouses — Snowflake Documentation

https://community.snowflake.com/s/article/Tuning-Snowflake

Snowflake Performance Tuning Checks Towards Query Optimization- Part 1

Written by Debi Prasad Mishra