CREATE STORAGE INTEGRATION
The CREATE STORAGE INTEGRATION
command defines a named external storage integration that can be used to access data in external storage systems like AWS S3, Google Cloud Storage (GCS), or Microsoft Azure Blob Storage.
Syntax
CREATE STORAGE INTEGRATION [IF NOT EXISTS] integration_name TYPE = STORAGE STORAGE_PROVIDER = provider_type ENABLED = [TRUE|FALSE] [STORAGE_AWS_ROLE_ARN = 'aws_role_arn'] [STORAGE_AWS_EXTERNAL_ID = 'external_id'] [GCP_SERVICE_ACCOUNT = 'service_account_email'] [AZURE_TENANT_ID = 'tenant_id'] [AZURE_APPLICATION_ID = 'application_id'] [STORAGE_ALLOWED_LOCATIONS = ('location1', 'location2', ...)] [CATALOG_TYPE = 'catalog_type'] [CATALOG_NAMESPACE = 'namespace'] [COMMENT = 'comment_string']
IF NOT EXISTS
: (Optional) Prevents an error if an integration with the same name already exists.integration_name
: The name of the storage integration to create.TYPE = STORAGE
: Specifies that this is a storage integration.STORAGE_PROVIDER = provider_type
: The type of storage provider. Valid values are:S3
: Amazon S3GCS
: Google Cloud StorageAZURE
: Microsoft Azure Blob Storage
ENABLED = [TRUE|FALSE]
: Whether the integration is enabled.STORAGE_AWS_ROLE_ARN = 'aws_role_arn'
: (Required for S3) The ARN of the AWS IAM role that ScramDB will assume to access S3.STORAGE_AWS_EXTERNAL_ID = 'external_id'
: (Optional for S3) The external ID to use when assuming the AWS IAM role.GCP_SERVICE_ACCOUNT = 'service_account_email'
: (Required for GCS) The email address of the GCP service account.AZURE_TENANT_ID = 'tenant_id'
: (Required for Azure) The Azure tenant ID.AZURE_APPLICATION_ID = 'application_id'
: (Required for Azure) The Azure application ID.STORAGE_ALLOWED_LOCATIONS = ('location1', 'location2', ...)
: (Optional) A list of allowed storage locations. Each location should be a URI (e.g.,s3://bucket/path/
,gs://bucket/path/
,azure://container/path/
).CATALOG_TYPE = 'catalog_type'
: (Optional) The type of catalog to use with this storage integration (e.g.,GLUE
).CATALOG_NAMESPACE = 'namespace'
: (Optional) The namespace to use with the catalog.COMMENT = 'comment_string'
: (Optional) A comment describing the integration.
Provider-Specific Parameters
Amazon S3
CREATE STORAGE INTEGRATION s3_integration TYPE = STORAGE STORAGE_PROVIDER = S3 ENABLED = TRUE STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-role' STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/path/') COMMENT = 'S3 integration for data lake access'
Google Cloud Storage (GCS)
CREATE STORAGE INTEGRATION gcs_integration TYPE = STORAGE STORAGE_PROVIDER = GCS ENABLED = TRUE GCP_SERVICE_ACCOUNT = 'service-account@project-id.iam.gserviceaccount.com' STORAGE_ALLOWED_LOCATIONS = ('gs://my-bucket/path/') COMMENT = 'GCS integration for data lake access'
Microsoft Azure Blob Storage
CREATE STORAGE INTEGRATION azure_integration TYPE = STORAGE STORAGE_PROVIDER = AZURE ENABLED = TRUE AZURE_TENANT_ID = 'tenant-id' AZURE_APPLICATION_ID = 'application-id' STORAGE_ALLOWED_LOCATIONS = ('azure://container/path/') COMMENT = 'Azure integration for data lake access'
Integration with Catalog
You can also specify catalog parameters to integrate with metadata catalogs like AWS Glue:
CREATE STORAGE INTEGRATION catalog_integration TYPE = STORAGE STORAGE_PROVIDER = S3 ENABLED = TRUE STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-role' STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/path/') CATALOG_TYPE = 'GLUE' CATALOG_NAMESPACE = 'my_namespace' COMMENT = 'S3 integration with Glue catalog'
Usage
After creating a storage integration, you can use it in CREATE EXTERNAL TABLE
commands to access data in the external storage system:
CREATE EXTERNAL TABLE my_table ( col1 INT, col2 VARCHAR)STORED AS PARQUETLOCATION 's3://my-bucket/path/data.parquet'TBLPROPERTIES ( 'storage_integration' = 's3_integration');
Information Schema
You can view information about storage integrations using the information_schema.storage_integrations
view:
SELECT * FROM information_schema.storage_integrations;