Deploy Great Expectations in hosted environments without a file system
The components in the great_expectations.yml
file define the Validation Results Stores, Data Source connections, and Data Docs hosts for a Data Context. These components might be inaccessible in hosted environments, such as Databricks, Amazon EMR, and Google Cloud Composer. The information provided here is intended to help you use Great Expectations in hosted environments.
Configure your Data Context
To use code to create a Data Context, see Instantiate an Ephemeral Data Context.
To configure a Data Context for a specific environment, see one of the following resources:
- How to instantiate a Data Context on an EMR Spark cluster
- How to use Great Expectations in Databricks
Create Expectation Suites and add Expectations
To add a Data Source and an Expectation Suite, see How to connect to a PostgreSQL database.
To add Expectations to your Suite individually, use the following code:
validator.expect_column_values_to_not_be_null("my_column")
validator.save_expectation_suite(discard_failed_expectations=False)
To configure your Expectation store to load a Suite at a later time, see one of the following resources:
- How to configure an Expectation store to use Amazon S3
- How to configure an Expectation store to use Azure Blob Storage
- How to configure an Expectation store to use GCS
- How to configure an Expectation store to use a filesystem
- How to configure an Expectation store to use PostgreSQL
Run validation
To create and run a Checkpoint in code, see How to create a new Checkpoint. In a hosted environment you will not be able to store the Checkpoint for repeated use across Python sessions, but you can recreate it each time your scripts run.
Use Data Docs
To build and view Data Docs in your environment, see Options for hosting Data Docs.