AWS Certified Data Engineer - Associate (DEA-C01) glossary
Terms selected for AWS Certified Data Engineer - Associate (DEA-C01) based on common objective language and practice focus.
Athena Partition Projection
Technique for generating partition metadata at query time without storing every partition entry in a catalog.
Read full term ->Batch Ingestion Patterns
Techniques for ingesting large volumes of data at scheduled intervals using AWS services like AWS Batch or AWS Glue.
Read full term ->Change Data Capture (CDC)
Pattern that captures inserts, updates, and deletes from source systems for incremental downstream processing.
Read full term ->Compliance Support
Ensuring data systems meet regulatory requirements and support auditability.
Read full term ->Data Governance
Establishing policies and procedures to manage data availability, usability, integrity, and security.
Read full term ->Data Lake Architecture
Scalable storage and processing pattern for structured and unstructured data, often using staged zones for ingestion, refinement, and consumption.
Read full term ->Data Lifecycle Management
Strategies for managing data from creation to deletion, including archiving and purging policies.
Read full term ->Data Quality Management
Ensuring the accuracy, completeness, and reliability of data through validation and cleansing processes.
Read full term ->Data Quality Rule
Validation logic that enforces constraints such as null checks, uniqueness, ranges, and referential integrity.
Read full term ->Data Store Partitioning
Dividing data into distinct partitions to improve query performance and manageability.
Read full term ->Data Transformation
The process of converting data from one format or structure into another using AWS services such as AWS Glue or AWS Lambda.
Read full term ->EMR Spark Processing
Distributed data processing on Amazon EMR using Apache Spark for large-scale ETL and analytics workloads.
Read full term ->Encryption Techniques
Methods for securing data at rest and in transit using AWS encryption services.
Read full term ->Fit-for-Purpose Data Stores
Selecting the most appropriate AWS data storage solution based on specific workload requirements.
Read full term ->AWS Glue Crawler
Service component that scans data stores and infers table schemas into the Glue Data Catalog.
Read full term ->Glue Data Catalog
Central metadata repository used by AWS analytics services to discover and query datasets.
Read full term ->Glue Job Bookmark
State tracking feature that allows incremental ETL processing by remembering previously processed data.
Read full term ->Identity and Access Management
Applying AWS IAM policies to control access to data resources and ensure security compliance.
Read full term ->Indexing Strategies
Techniques for creating indexes to enhance the speed of data retrieval operations.
Read full term ->Kinesis Data Streams Shard
Capacity unit in Kinesis Data Streams that determines ingestion and read throughput limits.
Read full term ->Kinesis Data Firehose
Managed streaming delivery service that buffers, optionally transforms, and writes data to destinations like S3 and Redshift.
Read full term ->Lake Formation Permissions
Fine-grained data access controls for lake resources including table, column, and row-level permissions.
Read full term ->Monitoring and Alerting
Setting up systems to track the health and performance of data platforms and notify stakeholders of issues.
Read full term ->Network Security Controls
Implementing VPC, security groups, and NACLs to protect data systems from unauthorized access.
Read full term ->Operational Automation
Automating routine tasks and deployment processes to improve efficiency and reduce errors.
Read full term ->Data Pipeline Orchestration
Scheduling and dependency management for multi-step data workflows including retries and alerts.
Read full term ->Parquet
Columnar file format optimized for analytical queries, compression, and predicate pushdown.
Read full term ->Pipeline Resilience
Designing data pipelines to be fault-tolerant and recoverable in case of failures.
Read full term ->Pipeline Troubleshooting
Identifying and resolving issues that cause data pipeline failures or bottlenecks.
Read full term ->Redshift Distribution Style
Data placement method across cluster nodes that affects join performance and data movement.
Read full term ->Redshift Sort Key
Column ordering strategy that improves scan efficiency for filtered and range-based queries.
Read full term ->Redshift Spectrum
Feature that allows Amazon Redshift to query data directly in S3 alongside local warehouse tables.
Read full term ->S3 Partitioning Strategy
Folder/key design approach that organizes data by high-selectivity attributes to improve query pruning and performance.
Read full term ->Schema Evolution
Controlled process for handling structural data changes over time while preserving pipeline compatibility.
Read full term ->Schema Evolution Handling
Techniques for managing changes in data schema over time without disrupting data processing pipelines.
Read full term ->Streaming Ingestion Patterns
Methods for continuously ingesting data from streaming sources using AWS services like Kinesis Data Streams or AWS Lambda.
Read full term ->
