Skip to content

CREATE STORAGE INTEGRATION

The CREATE STORAGE INTEGRATION command defines a named external storage integration that can be used to access data in external storage systems like AWS S3, Google Cloud Storage (GCS), or Microsoft Azure Blob Storage.

Syntax

CREATE STORAGE INTEGRATION [IF NOT EXISTS] integration_name
TYPE = STORAGE
STORAGE_PROVIDER = provider_type
ENABLED = [TRUE|FALSE]
[STORAGE_AWS_ROLE_ARN = 'aws_role_arn']
[STORAGE_AWS_EXTERNAL_ID = 'external_id']
[GCP_SERVICE_ACCOUNT = 'service_account_email']
[AZURE_TENANT_ID = 'tenant_id']
[AZURE_APPLICATION_ID = 'application_id']
[STORAGE_ALLOWED_LOCATIONS = ('location1', 'location2', ...)]
[CATALOG_TYPE = 'catalog_type']
[CATALOG_NAMESPACE = 'namespace']
[COMMENT = 'comment_string']
  • IF NOT EXISTS: (Optional) Prevents an error if an integration with the same name already exists.
  • integration_name: The name of the storage integration to create.
  • TYPE = STORAGE: Specifies that this is a storage integration.
  • STORAGE_PROVIDER = provider_type: The type of storage provider. Valid values are:
    • S3: Amazon S3
    • GCS: Google Cloud Storage
    • AZURE: Microsoft Azure Blob Storage
  • ENABLED = [TRUE|FALSE]: Whether the integration is enabled.
  • STORAGE_AWS_ROLE_ARN = 'aws_role_arn': (Required for S3) The ARN of the AWS IAM role that ScramDB will assume to access S3.
  • STORAGE_AWS_EXTERNAL_ID = 'external_id': (Optional for S3) The external ID to use when assuming the AWS IAM role.
  • GCP_SERVICE_ACCOUNT = 'service_account_email': (Required for GCS) The email address of the GCP service account.
  • AZURE_TENANT_ID = 'tenant_id': (Required for Azure) The Azure tenant ID.
  • AZURE_APPLICATION_ID = 'application_id': (Required for Azure) The Azure application ID.
  • STORAGE_ALLOWED_LOCATIONS = ('location1', 'location2', ...): (Optional) A list of allowed storage locations. Each location should be a URI (e.g., s3://bucket/path/, gs://bucket/path/, azure://container/path/).
  • CATALOG_TYPE = 'catalog_type': (Optional) The type of catalog to use with this storage integration (e.g., GLUE).
  • CATALOG_NAMESPACE = 'namespace': (Optional) The namespace to use with the catalog.
  • COMMENT = 'comment_string': (Optional) A comment describing the integration.

Provider-Specific Parameters

Amazon S3

CREATE STORAGE INTEGRATION s3_integration
TYPE = STORAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/path/')
COMMENT = 'S3 integration for data lake access'

Google Cloud Storage (GCS)

CREATE STORAGE INTEGRATION gcs_integration
TYPE = STORAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
GCP_SERVICE_ACCOUNT = 'service-account@project-id.iam.gserviceaccount.com'
STORAGE_ALLOWED_LOCATIONS = ('gs://my-bucket/path/')
COMMENT = 'GCS integration for data lake access'

Microsoft Azure Blob Storage

CREATE STORAGE INTEGRATION azure_integration
TYPE = STORAGE
STORAGE_PROVIDER = AZURE
ENABLED = TRUE
AZURE_TENANT_ID = 'tenant-id'
AZURE_APPLICATION_ID = 'application-id'
STORAGE_ALLOWED_LOCATIONS = ('azure://container/path/')
COMMENT = 'Azure integration for data lake access'

Integration with Catalog

You can also specify catalog parameters to integrate with metadata catalogs like AWS Glue:

CREATE STORAGE INTEGRATION catalog_integration
TYPE = STORAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/path/')
CATALOG_TYPE = 'GLUE'
CATALOG_NAMESPACE = 'my_namespace'
COMMENT = 'S3 integration with Glue catalog'

Usage

After creating a storage integration, you can use it in CREATE EXTERNAL TABLE commands to access data in the external storage system:

CREATE EXTERNAL TABLE my_table (
col1 INT,
col2 VARCHAR
)
STORED AS PARQUET
LOCATION 's3://my-bucket/path/data.parquet'
TBLPROPERTIES (
'storage_integration' = 's3_integration'
);

Information Schema

You can view information about storage integrations using the information_schema.storage_integrations view:

SELECT * FROM information_schema.storage_integrations;