Data is being created today at speed and scale that we’ve never seen before. Spurred on by the promise of digital transformation, much of this data is being natively created or published out to cloud data platforms like Snowflake, Redshift, and Databricks. Unfortunately, broad swaths of this data is then left ungoverned, leading to data breaches that expose organizations’ most sensitive assets.
Fortunately for CISOs, CDOs, and data owners and stewards, solutions already exist that empower you to govern data as it’s being created. Taking this data-centric security approach is the best way to ensure that you’re securing your data “as it’s born”.
Traditionally, as new systems were being rolled out, they would go through a Security Review or Governance cycle where Identity & Access Management (IAM) fundamentals would be applied to ensure that only the appropriate people would have access to the systems and information to which they were entitled.
Today, as organizations move to massive cloud-based data platforms like Snowflake, Redshift, and Databricks, the old model no longer holds.
Read: Why IAM is Insufficient for Data-Centric Security
These new dynamics lead to challenges that we’re facing for the first time as an industry. How do we allow the business to move forward at the pace and scale that is needed while still protecting ourselves from leaving that data inadequately protected. Our current approaches leave us exposed along a number of dimensions.
Read: 7 out of the top 10 Data Breaches of 2021 were in the cloud
The first part of tackling this problem is by leveraging Data Classification engines. There are a myriad of solutions available that you can point to every single one of your cloud data sources to automatically classify the data within as PII, PCI, HIPAA, etc.
Bottom line: You can’t actually have data access control if you don’t know where your sensitive data resides, and exactly how sensitive it is. Select a tool and point it to your data sources (with resource cycles built in for curation) to automatically classify your data as it’s born. Even without any other controls in place, it will give you visibility into where you have data risk exposure.
You should come away with a clear sense of:
Read: Collibra and TrustLogix Close the Loop Between Data Classification and Data Access
There are numerous ways to apply granular data access control policies for your sensitive information. Each cloud data platform (Snowflake, Redshift, Databricks, etc.) provides rich native constructs that allow you to define which users or roles should have access to what data. Leverage these native tools to define granular controls over how users can access your data.
Leverage these capabilities in conjunction with the learnings from your data classification engine(s) to model and enforce data access control policies that can incorporate classification tags to more tightly control how your sensitive data is accessed.
Takeaways:
Read: Data Lake Security: An Explanatory Guide with Best Practices
Here’s where the magic can happen. You can put these two critical pieces together to protect your data as it is being created, as part of a holistic data centric security strategy. You can now model policies that apply to all data across all platforms without prior knowledge of the contents of the data being created.
Data Access Governance platforms empower you to leverage Classification tags to define overarching policies that will immediately enforce existing policies on new data, based on the sensitivity of that data.
You can use a Data Access Governance platform to define policies like:
Read: A Four-Step Framework for Data Centric Security
This approach has an immediate and positive impact on the data creation → classification → security → consumption lifecycle.
The pace and scale of data creation is only going to increase. Overburdened security organizations are already struggling to stay abreast of delivering visibility and granular controls for governing who should have access to that data. A data centric security approach will help you pre-define business-level policies about how your data should be protected, and ensure that it is cohesively and universally applied to all your data, as it’s being born and regardless of where it resides.