Posted On: Feb 24, 2023

AWS Glue Crawlers now integrate with AWS Lake Formation, simplifying crawler setup, and supporting centralized permissions for in-account and cross-account crawling of AWS S3 data lakes.  

AWS Glue Crawlers are used to discover datasets, extract schema information, and populate the AWS Glue Data Catalog. Before this integration, you needed to set up Amazon IAM and Amazon S3 bucket policies for crawler access to S3 data lake targets. Customers who use Lake Formation to manage these targets preferred having all permissions centralized in Lake Formation instead of setting up direct S3 access for the crawler role. With this Glue Crawler and Lake Formation integration, you can now use Lake Formation permissions for the crawler's access to your Lake Formation managed tables. 

When you configure the AWS Glue Crawler to use Lake Formation, by default the crawler uses Lake Formation in the same account to obtain data access credentials. However, you can also configure the crawler to use Lake Formation for a different account by providing an account ID during creation. The cross-account capability allows customers to manage permissions from a central governance account. Customers prefer the central governance experience over writing bucket policies separately in each bucket owning account. To build a data mesh architecture, you can author permissions in a single Lake Formation governance to manage access to data locations and crawlers spanning multiple accounts in their data lake.

AWS Glue Crawler support for Lake Formation is Generally Available in all regions where both AWS Glue and Lake Formation is available. For a list of regions, see the AWS Region Table. Read the blog post, and visit AWS Glue Crawler documentation to learn more.