CompTIA Data+ (DA0-002) glossary
Terms selected for CompTIA Data+ (DA0-002) based on common objective language and practice focus.
Data Encryption
Data encryption involves converting data into a coded format to prevent unauthorized access and ensure confidentiality.
Read full term ->Bar Chart
Visualization that uses horizontal or vertical bars to compare categorical values.
Read full term ->Cloud Computing
Cloud computing is the delivery of computing services over the internet, allowing for on-demand access to resources such as servers, storage, and applications.
Read full term ->Data Type
Data types define the type of data that can be stored and manipulated within a program, such as integers, strings, or booleans.
Read full term ->Histogram
Chart that groups continuous data into bins to show frequency distribution.
Read full term ->Non-relational Database
A type of database that does not use the tabular schema of rows and columns found in most traditional database systems. It is often used for large sets of distributed data.
Read full term ->Relational Database
Database that organizes data into tables with rows and columns linked by key relationships.
Read full term ->Scatter Plot
Chart that plots individual data points on two axes to show relationships or clusters.
Read full term ->Correlation
Statistical measure of the strength and direction of a relationship between two variables.
Read full term ->Data Cleansing
Correcting or removing inaccurate, incomplete, or irrelevant records from a dataset.
Read full term ->Data Lake
Centralized repository that stores raw structured and unstructured data at any scale.
Read full term ->Data Profiling
Analyzing a dataset to understand its structure, quality, and content characteristics.
Read full term ->KPI
Key Performance Indicator is a measurable value that tracks progress toward a business objective.
Read full term ->Null Value
Placeholder indicating that a data field contains no value.
Read full term ->Outlier
A data point significantly different from the rest of the dataset.
Read full term ->Regression Analysis
Statistical technique that models the relationship between a dependent variable and one or more independent variables.
Read full term ->Standard Deviation
Measure of how spread out values are from the mean of a dataset.
Read full term ->API Ingestion
Pulling data into a pipeline by calling external application programming interfaces.
Read full term ->BI Tool
Business Intelligence software used to visualize data and create interactive dashboards.
Read full term ->Containerization
Packaging an application and its dependencies into an isolated, portable runtime unit.
Read full term ->CSV
Comma-Separated Values flat-file format commonly used for tabular data exchange.
Read full term ->Dashboard
Visual interface that consolidates key metrics and charts for at-a-glance monitoring.
Read full term ->Data Compliance
Data compliance involves adhering to laws and regulations governing data privacy, security, and management, such as GDPR or HIPAA.
Read full term ->Data Dashboard
A data dashboard is an interactive tool that displays key metrics and data points in a visual format, allowing users to monitor performance and make informed decisions.
Read full term ->Data Deduplication
Process of identifying and removing duplicate records from a dataset.
Read full term ->Data Documentation
Data documentation involves creating comprehensive records of data sources, structures, and processes to ensure clarity and consistency in data management.
Read full term ->Data Exploration
Data exploration is the initial step in data analysis where analysts use visual and statistical methods to understand the data's characteristics and identify patterns.
Read full term ->Data Governance
Framework of policies, roles, and processes that ensure data quality, security, and compliance.
Read full term ->Data Integration
Data integration involves combining data from different sources to provide a unified view, often using ETL processes or data virtualization.
Read full term ->Data Lineage
Documentation that traces data from origin through transformations to its final destination.
Read full term ->Data Masking
Technique that replaces sensitive data with fictional but realistic values.
Read full term ->Data Monitoring
Data monitoring involves continuously observing data processes and flows to ensure data quality, integrity, and performance.
Read full term ->Data Quality
Measure of data accuracy, completeness, consistency, and timeliness for its intended use.
Read full term ->Data Query
A data query is a request for information from a database, often written in a query language like SQL, to retrieve specific data.
Read full term ->Data Retention Policy
Rules defining how long data must be kept and when it should be deleted.
Read full term ->Data Structure
Data structures are specific ways of organizing and storing data in a computer so that it can be accessed and modified efficiently.
Read full term ->Data Transformation
Data transformation involves converting data into a suitable format or structure for analysis, which may include normalization, aggregation, or encoding.
Read full term ->Data Versioning
Data versioning is the practice of tracking and managing changes to data over time, enabling rollback to previous states if necessary.
Read full term ->Data Visualization
Data visualization is the graphical representation of data to help stakeholders understand complex data insights through visual elements like charts and graphs.
Read full term ->Data Warehouse
Optimized analytical store that organizes data into schemas for fast querying and reporting.
Read full term ->ETL
Extract, Transform, Load process that moves data from sources into a target system after reshaping it.
Read full term ->Heat Map
Visualization using color intensity to represent magnitude across a matrix.
Read full term ->JSON
JavaScript Object Notation is a lightweight text format for structured data interchange.
Read full term ->Mean, Median, Mode
Basic descriptive statistics: mean is the average, median the middle value, mode the most frequent.
Read full term ->NLP
Natural Language Processing enables machines to interpret and generate human language.
Read full term ->On-Premises
On-premises refers to IT infrastructure and resources that are hosted and managed within an organization's own facilities.
Read full term ->Parquet
Columnar storage file format optimized for analytical queries on large datasets.
Read full term ->PII
Personally Identifiable Information that can be used to identify an individual.
Read full term ->Report Validation
Report validation ensures that data reports are accurate, complete, and consistent with the source data, often involving checks and balances.
Read full term ->Schema-on-Read
Approach where data is stored raw and structure is applied at query time.
Read full term ->Schema-on-Write
Approach where data must conform to a defined schema before it is stored.
Read full term ->Statistical Techniques
Statistical techniques involve methods such as hypothesis testing, regression, and variance analysis to interpret data and draw conclusions.
Read full term ->Supervised Learning
Machine learning approach where a model is trained on labeled input-output pairs.
Read full term ->Unsupervised Learning
Machine learning approach where a model finds patterns in data without predefined labels.
Read full term ->
