As companies build more extensive and more diverse data sets, a new problem has surfaced; secure data sharing. Teams need to share data with partners inside and outside of their companies. But it's not always easy to safely, securely, and effectively share it.
How do you keep your data safe from hackers? Do you have a safe and effective way to get it to your business partners? Can you make sure they have the latest and most up-to-date copy?
This article will talk about Snowflake and how to securely share data with the Snowflake platform.
What Is Snowflake?
Snowflake is a fully-managed data warehouse service for data engineering, data science, data lakes, and data warehouses. It's also an excellent platform for sharing and consuming shared data. Snowflake is available in Amazon Web Services, Microsoft Azure, and Google Cloud.
Snowflake comes with on-the-fly scalability and robust automation capabilities. And, of course, it has powerful data-sharing features that we'll cover in-depth below.
It's helpful to understand a few key points about Snowflake.
- Snowflake clients can use an ANSI SQL interface or the Snowpark library to write and read data.
- It manages your storage, including how the information is structured.
- Snowflake manages all database security and data encryption, too. Their processes and procedures meet industry certifications like PCI DSS and HIPAA.
Security is an essential aspect of Snowflake's offering. They argue that they built security into their system rather than "bolting it on." It's deployed at government agencies and is approved for several certified levels of access.
Snowflake's robust platform has many uses case, including:
- A data warehouse to process and store large volumes of data. It can easily drive an analytics platform.
- Their scalable systems can automatically scale up and down for dynamic Extract, Transform and Load (ETL) pipelines.
- Its infrastructure is also effective for sharing the results of ETL across a large enterprise and to third-party partners.
What Is Secure Data Sharing?
Data Sharing
Data sharing is the process of distributing collections of data with other users and apps. Effective data sharing does this while maintaining data integrity. It also restricts access to only the people who need to have access to it.
Wikipedia's definition of data sharing discusses sharing scholarly and scientific information. It covers the reasons, methods, and problems involved with those activities. For a long time, data sharing was primarily an academic issue. But that's not true anymore. Now, as data sets grow in size, scope, and importance, data sharing is an important business activity.
Data sharing for business presents a significant set of problems. Some are new, and some have been around for a long time.
The new problems emerge from the nature of the data businesses collect and analyze. Business data often contains sensitive information about customers or business operations, and companies collect vast quantities of information.
Here are some of the problems effective data sharing must account for:
- Moving large volumes of data is expensive, time-intensive, and prone to errors.
- Moving big data creates an Extract, Transform and Load (ETL) process at the destination.
- When you copy information, you create a synchronization problem. What happens when you update the original? Now you need a pipeline.
- Every time you move or copy information, you open it to new attacks.
- When you create a place for partners to access data, you need a security process.
Secure Data Sharing
Secure data sharing guarantees the safety and integrity of data. It enforces access control so that users only see the data they should. And, it locks out users that shouldn't see any data at all. It also ensures that data stays safe and complete until it reaches its final destination.
What Is Snowflake Data Sharing?
Snowflake enables data sharing without moving or copying it. They use their services layer to provide read-only access to information that data owners mark as eligible for sharing.
This design solves several issues:
- If you don't copy or move data, it can't fall out of synchronization with the source.
- Since the consumer can't post changes to the data, you've avoided another synchronization issue.
- Data remains at rest and encrypted at "home."
- Consumers request data as they need it instead of retrieving large files and paying for storage and ETL pipelines.
In Snowflake terminology, Providers share data with Consumers.
Snowflake Shares
Providers share data by creating Snowflake Shares. Shares contain the information Snowflake needs to share a database:
- Read access to specific objects in the database.
- Usage access to the database and underlying schema
- Consumers with read access to the objects.
The share makes the database and objects accessible in the consumer's accounts. New objects are available in real-time, and the provider can revoke access at any time.
Snowflake Marketplace
With Snowflake, you can share data with more than one consumer and from more than one database. It's the foundation of their Data Marketplace, and any Snowflake user can contribute data to the marketplace using these tools.
Secure Snowflake Data Sharing
Providing your Snowflake data to consumers is a straightforward process. Let's see how it's done.
We'll perform these steps using Snowflake's SQL. So, you'll need a session via the web interface or another SQL tool.
For this tutorial, we'll assume the following:
- The user running the SQL commands has accountadmin access rights.
- We're sharing tables from a database named mydata_db.
- The database has a schema named myschema.
- Consumers will be able to see a table named shared_table.
- The consumer accounts are customer1 and customer2.
So let's get started!
Providing a Direct Data Share
First, we need to create a share:
create share myshare;
Then, we need to add the shared objects to the share. In order to see shared_table, our consumers need to have usage access to the database and schema. Then they need select access on the table.
So, we'll add three objects to the share:
grant usage on database mydata_db to share myshare;
grant usage on schema mydata_db.myschema to share myshare;
grant select on table mydata_db.myschema.shared_table to share myshare;
Before adding users to the share, we can view its privileges to make sure we didn't miss anything.
show grants to share myshare;
+-------------------------------+-----------+------------+---------------------------------+------------+-----------------+--------------+--------------+
| created_on | privilege | granted_on | name | granted_to | grantee_name | grant_option | granted_by |
|-------------------------------+-----------+------------+---------------------------------+------------+-----------------+--------------+--------------|
| 2022-02-15 16:45:07.307 -0700 | USAGE | DATABASE | MYDATA_DB | SHARE | MYSTUFF.MYSHARE | false | ACCOUNTADMIN |
| 2022-02-15 16:45:10.310 -0700 | USAGE | SCHEMA | MYDATADB.MYSCHEMA | SHARE | MYSTUFF.MYSHARE | false | ACCOUNTADMIN |
| 2022-02-15 16:45:12.312 -0700 | SELECT | TABLE | MYDATADB.MYSCHEMA.SHARED_TABL | SHARE | MYSTUFF.MYSHARE | false | ACCOUNTADMIN |
+-------------------------------+-----------+------------+---------------------------------+------------+-----------------+--------------+--------------+
There are the three grants we gave to the share!
Now, we need to add the consumers. We can add them as a list with a single command.
alter share myshare add accounts=customer1, customer2;
The share is ready to go! So, let's see what it looks like:
show shares;
+-------------------------------+----------+-------------------------+-----------------------+----------------------+--------------+---------------------------+
| created_on | kind | name | database_name | to | owner | comment |
|-------------------------------+----------+-------------------------+-----------------------+----------------------+--------------+---------------------------|
| 2022-02-15 17:02:29.625 -0700 | OUTBOUND | MYSTUFF.MYSHARE | MYDATA_DB | CUSTOMER1, CUSTOMER2 | ACCOUNTADMIN | |
+-------------------------------+----------+-------------------------+-----------------------+----------------------+--------------+---------------------------+
And just like that, you're ready to share data with your consumers.
TrustLogix Provides Data Centric Security for Snowflake
TrustLogix provides monitoring and visibility into data sharing activities which will help with faster collaboration among cross organizations. The solution provides visibility into:
- Who is sharing data with whom
- Who has sharing permissions
- Which roles have sharing permissions
- What data shares are not being used
- Confidential and PII data sharing
Based on sharing activity patterns, TrustLogix can detect and alert on anomalous sharing activities, non compliance data sharing, SoD violations, unexpected grants of data sharing permissions, and unexpected sharing of sensitive data.
Conclusion
We've discussed what data sharing is and what's required to make it successful. Secure data sharing needs solid access control and the data needs to remain secure while it's at rest.
We also discussed Snowflake, a robust data cloud platform for data warehousing and engineering. It makes consuming, processing, and sharing last data sets easy, safe, and secure. It implements secure data sharing via its strong security features and by making it possible to share data without moving or copying it.
Finally, we used Snowflake's SQL tools to create a data share and share it with two consumer accounts. Then, we used the same tools to examine the share and see that it was configured correctly.
Contact TrustLogix for a demonstration of how we can enable faster collaboration and secure Data Sharing in Snowflake.