Artificial Intelligence Glossary
A collection of artificial intelligence (AI) terms, jargon, and definitions that you and your team should be aware of.
-
Accuracy
Accuracy refers to the closeness of a measured value to a standard or known value. In the context of artificial intelligence, it often denotes the degree of correctness of a model's predictions compared to the actual outcomes.
-
Actionable Intelligence
Actionable intelligence is information that can be acted upon or used to make informed decisions. In AI, it refers to insights extracted from data that are relevant and valuable for decision-making or problem-solving.
-
Activation Function
An activation function is a mathematical operation applied to the output of a neural network node. It introduces non-linearity to the model, enabling it to learn complex patterns and make nonlinear transformations of the input data.
-
Activation Gradient
Activation gradient refers to the derivative of the activation function with respect to the input to the function. It is crucial for training neural networks using gradient descent optimization algorithms.
-
Adversarial Examples
Adversarial examples are inputs to machine learning models that are intentionally designed to cause the model to make mistakes. These inputs are often slightly perturbed versions of legitimate data, crafted to exploit vulnerabilities in the model's decision boundary.
-
Anaphora
Anaphora refers to the linguistic phenomenon where a word or phrase refers back to another word or phrase mentioned earlier in the text. In natural language processing, handling anaphora is important for tasks such as coreference resolution.
-
Annotation
Annotation involves the labeling or tagging of data to provide additional information or context. In AI, annotation is commonly used in supervised learning tasks to create labeled datasets for training machine learning models.
-
Artificial Intelligence (AI)
Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, problem-solving, perception, and language understanding.
-
Artificial Neural Network (ANN)
An artificial neural network (ANN) is a computational model inspired by the structure and functioning of biological neural networks. It consists of interconnected nodes (neurons) organized in layers, capable of learning and performing tasks such as classification and regression.
-
Auto-classification
Auto-classification is the process of automatically categorizing or labeling data based on its content or characteristics. It often involves the use of machine learning algorithms to classify data into predefined categories or clusters.
-
Auto-complete
Auto-complete is a feature commonly found in text editing software and search engines that predicts and suggests completions for the current input based on previously entered text or patterns. In AI, it can be implemented using various techniques such as language models and recommendation systems.
-
Bagging
A machine learning ensemble technique that combines the predictions of multiple models.
-
BERT
Bidirectional Encoder Representations from Transformers, a pre-trained natural language processing model.
-
Bias-Variance Tradeoff
The balance between bias and variance in machine learning models to achieve optimal performance.
-
Big Data
Large volumes of data, both structured and unstructured, that inundate a business on a day-to-day basis.
-
Cataphora
A linguistic concept where a pronoun refers to a later noun or phrase.
-
Categorization
The process of organizing items into categories based on their similarities or differences.
-
Category
A group or class of things having some common characteristics or attributes.
-
Category Trees
Hierarchical structures that organize categories into parent-child relationships.
-
Classification
The process of categorizing data points into classes or categories.
-
Clustering
The process of grouping similar data points together.
-
Cognitive Map
A mental representation of physical and spatial information.
-
Composite AI
The integration of multiple AI technologies to create more advanced systems.
-
Computational Linguistics
The interdisciplinary field dealing with the statistical and rule-based modeling of natural language from a computational perspective.
-
Computational Semantics
The branch of computational linguistics and artificial intelligence that focuses on the meaning of words and sentences in a language.
-
Content Enrichment or Enrichment
The process of enhancing content by adding metadata, tags, or other contextual information.
-
Controlled Vocabulary
A predefined list of terms used to tag or categorize content.
-
Conversational AI
AI systems capable of understanding and generating human-like dialogue.
-
Convolution
A mathematical operation used in neural networks for feature extraction.
-
Convolutional Neural Network (CNN)
A type of artificial neural network designed for image recognition and processing.
-
Co-occurrence
The frequency with which two items appear together in a dataset.
-
Corpus
A large and structured set of texts in digital form, used to study language.
-
Data Augmentation
The technique of artificially increasing the size of a dataset by creating modified versions of existing data.
-
Data Discovery
The process of finding and identifying relevant datasets for analysis.
-
Data Drift
The gradual change in the distribution or properties of data over time, leading to model deterioration.
-
Data Extraction
The process of retrieving data from various sources and converting it into a usable format.
-
Data Ingestion
The process of collecting, processing, and importing data into a system or database.
-
Data Labelling
The process of manually annotating data with labels or tags to facilitate supervised learning.
-
Data Scarcity
The situation where there is an insufficient amount of data available for analysis or training models.
-
Decision Tree
A decision support tool that uses a tree-like graph of decisions and their possible consequences.
-
Deep Learning
A subset of machine learning that utilizes artificial neural networks with multiple layers.
-
Did You Mean (DYM)
A feature in search engines or text editing software that suggests corrections for misspelled words or phrases.
-
Disambiguation
The process of resolving ambiguity in natural language or data.
-
Domain Knowledge
Expertise or understanding of a particular subject area or industry.
-
Embedding
A mathematical representation of a word, phrase, or document in a continuous vector space.
-
Emotion AI (Affective Computing)
The branch of artificial intelligence that deals with recognizing, interpreting, processing, and simulating human emotions.
-
Ensemble Learning
A machine learning technique that combines the predictions of multiple models to improve accuracy and robustness.
-
Entity
An object or concept that is identifiable and distinct, often referenced in natural language processing.
-
Environmental, Social, and Governance (ESG)
Criteria used by investors to evaluate a company's impact on society and the environment alongside its financial performance.
-
Ethics in AI
The study of moral principles and guidelines that govern the development and use of artificial intelligence systems.
-
ETL (Entity Recognition, Extraction)
Extract, Transform, Load: A process used in data integration to collect data from various sources, transform it into a usable format, and load it into a target database.
-
Explainable AI
Artificial intelligence models and systems that can provide understandable explanations for their decisions and actions.
-
Extraction or Keyphrase Extraction
The process of automatically identifying and extracting key phrases or important information from text documents.
-
Feature Engineering
The process of selecting, transforming, or creating new features from raw data to improve model performance.
-
Federated Learning
A machine learning approach where multiple decentralized devices collaboratively train a model while keeping data localized.
-
Fine-tuning
The process of further training a pre-trained model on a specific dataset to improve its performance on a particular task.
-
F-score (F-measure, F1 measure)
A measure of a test's accuracy that considers both precision and recall, calculated as the harmonic mean of precision and recall.
-
Generative Adversarial Networks (GANs)
A class of artificial intelligence algorithms used in unsupervised machine learning, composed of two neural networks: the generator and the discriminator.
-
Genetic Algorithms
A search heuristic inspired by the process of natural selection, used to find optimal solutions to optimization and search problems.
-
Gradient Descent
An optimization algorithm used to minimize the loss function by adjusting the parameters of a model in the direction of the steepest descent of the gradient.
-
Hallucinations
In the context of AI, hallucinations refer to incorrect or misleading outputs generated by a model.
-
Hierarchical Reinforcement Learning
A machine learning technique that applies reinforcement learning to hierarchical tasks, enabling agents to learn and navigate complex environments efficiently.
-
Hybrid AI
The integration of multiple AI techniques or approaches, such as symbolic AI and machine learning, to solve complex problems.
-
Hyperparameters
Parameters that define the structure and behavior of a machine learning model, typically set before the learning process begins.
-
Inference
The process of drawing conclusions from data or models based on observed evidence or prior knowledge.
-
Inference Engine
A component of artificial intelligence systems that applies logical rules to interpret and reason about data.
-
Insight Engines
Systems that use artificial intelligence and natural language processing to discover insights and patterns within data.
-
Intelligent Document Extraction and Processing (IDEP)
The automated extraction and processing of information from unstructured documents using artificial intelligence.
-
Intelligent Document Processing (IDP)
The use of artificial intelligence technologies to automate the extraction, classification, and processing of information from documents.
-
Internet of Things (IoT)
A network of interconnected devices embedded with sensors, software, and other technologies to exchange data and perform actions autonomously.
-
Kernel Methods
A class of algorithms for pattern analysis and machine learning, based on defining and manipulating similarity functions in high-dimensional spaces.
-
K-nearest Neighbors (KNN)
A non-parametric algorithm used for classification and regression tasks that relies on the similarity of data points in a feature space.
-
Knowledge Graph
A graph-based knowledge representation that captures relationships between entities and their attributes in a semantic network.
-
Knowledge Model
A formal representation of knowledge, typically structured in a way that is understandable by computers and used for reasoning and problem-solving.
-
Labelled Data
Data that has been manually annotated with one or more labels, typically used for supervised machine learning tasks.
-
LangOps (Language Operations)
The operationalization and management of language-related processes, tools, and workflows.
-
Language Data
Data specifically related to language, including text corpora, speech recordings, and linguistic annotations.
-
Large Language Models (LLMs)
Advanced natural language processing models with millions or billions of parameters, capable of generating human-like text.
-
Latent Variable
A variable that is not directly observed but inferred from other variables, used in statistical models to represent hidden factors.
-
Lemma
The base or dictionary form of a word, often used in natural language processing and linguistic analysis.
-
Lexicon
A complete set of meaningful units in a language, including words, morphemes, and phrases, with associated semantic information.
-
Linked Data
A method of publishing structured data so that it can be interlinked and become more useful through semantic queries.
-
Long Short-Term Memory (LSTM)
A type of recurrent neural network architecture capable of learning long-term dependencies in sequential data.
-
Machine Learning (ML)
A subset of artificial intelligence focused on the development of algorithms and statistical models that enable computers to learn and improve from experience.
-
Mean Squared Error (MSE)
A measure of the average squared difference between predicted and actual values, commonly used to evaluate regression models.
-
Metadata
Data that provides information about other data, such as the structure, format, or characteristics of a dataset.
-
Model
A mathematical representation of a real-world process or system used to make predictions, classify data, or gain insights.
-
Model Compression
The process of reducing the size of a machine learning model without significant loss in performance, often to enable deployment on resource-constrained devices.
-
Model Drift
The phenomenon where the performance of a machine learning model deteriorates over time due to changes in the underlying data distribution.
-
Model Parameter
A configuration variable or weight in a machine learning model that is learned from training data and used to make predictions.
-
Morphological Analysis
The process of analyzing the structure and form of words in a language to understand their grammatical properties and relationships.
-
Natural Language Processing (NLP)
A branch of artificial intelligence that focuses on the interaction between computers and humans through natural language.
-
Natural Language Understanding
The ability of a computer program to comprehend and interpret human language in a meaningful way.
-
Neural Architecture Search
The process of automatically finding the optimal architecture or configuration for a neural network.
-
NLG (Natural Language Generation)
The process of producing human-like text or speech from structured data or pre-defined templates using artificial intelligence.
-
NLQ (Natural Language Query)
The capability of querying databases or systems using natural language instead of traditional programming languages or query languages.
-
NLT (Natural Language Technology)
Technology that enables computers to interact with users using natural language, including understanding, generating, and processing text.
-
One-shot Learning
A machine learning approach where a model learns from only one example or a few examples of each class, mimicking human learning.
-
Ontology
A formal representation of knowledge that defines the concepts, relationships, and properties within a domain.
-
Outlier Detection
The process of identifying anomalies or outliers in a dataset that deviate from the norm or expected behavior.
-
Overfitting
The phenomenon where a machine learning model learns to fit the training data too closely, leading to poor generalization and performance on unseen data.
-
Parsing
The process of analyzing the grammatical structure of a sentence to determine its syntactic components and relationships.
-
Part-of-Speech Tagging
The process of assigning grammatical categories (such as noun, verb, adjective) to words in a sentence.
-
PEMT (Post Edit Machine Translation)
The process of manually correcting or improving machine-translated text to ensure accuracy and fluency.
-
Post-processing
The manipulation or enhancement of data or output after it has been generated by a machine learning model or algorithm.
-
Precision
A metric that measures the proportion of true positive predictions among all positive predictions made by a model.
-
Precision and Recall
Two complementary metrics used to evaluate the performance of classification models, with precision measuring the accuracy of positive predictions and recall measuring the proportion of actual positives that were correctly identified by the model.
-
Precision-Recall Curve
A graphical representation of the trade-off between precision and recall for different thresholds of a classification model.
-
Pre-processing
The manipulation or transformation of raw data before it is fed into a machine learning algorithm or model.
-
Principal Component Analysis (PCA)
A dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving the most important features or patterns.
-
Prompt Engineering
The process of designing or refining prompts or input examples to guide the behavior of language models or AI systems.
-
Q-Learning
A model-free reinforcement learning technique where an agent learns to make decisions by iteratively updating a value function based on the expected return of actions taken in different states.
-
Quantum Machine Learning
The intersection of quantum computing and machine learning, exploring algorithms and models that leverage quantum computing principles to solve complex problems.
-
Random Forest
An ensemble learning method that constructs multiple decision trees during training and outputs the mode or mean prediction of the individual trees as the final prediction.
-
Recall
A metric that measures the proportion of actual positive cases that were correctly identified by a model out of all actual positive cases.
-
Recurrent Neural Networks (RNN)
A class of neural networks designed to process sequential data by maintaining an internal state or memory, allowing them to capture temporal dependencies.
-
Regularization
A technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function that discourages complex or extreme parameter values.
-
Reinforcement Learning
A machine learning paradigm where an agent learns to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties.
-
Relations
In the context of data and knowledge representation, relations refer to the connections or associations between entities or concepts.
-
Responsible AI
The ethical and accountable development, deployment, and use of artificial intelligence systems, considering their societal impacts and implications.
-
Rules-based Machine Translation (RBMT)
An approach to machine translation that relies on explicit linguistic rules and patterns to translate text between languages.
-
SAO (Subject-Action-Object)
A syntactic structure used to represent relationships between entities in a sentence, consisting of a subject performing an action on an object.
-
Self-Supervised Learning
A machine learning paradigm where models learn to represent data without human-labeled supervision, often by predicting masked or corrupted input samples.
-
Semantic Network
A graph-based knowledge representation where nodes represent concepts or entities, and edges represent relationships or connections between them.
-
Semantics
The study of meaning in language, including the interpretation of words, phrases, and sentences in context.
-
Semantic Search
A search technique that considers the meaning and context of user queries and documents to retrieve relevant results, often using natural language processing and semantic analysis.
-
Semi-structured Data
Data that does not conform to a rigid schema or structure but contains some organizational elements, such as tags, keys, or attributes.
-
Semi-Supervised Learning
A machine learning approach that combines both labeled and unlabeled data for training, leveraging the abundance of unlabeled data and a smaller set of labeled examples.
-
Sentiment
The emotional tone or attitude expressed in a piece of text, speech, or communication.
-
Sentiment Analysis
The process of identifying, extracting, and quantifying sentiment from text data to determine the emotional tone or attitude expressed.
-
Similarity (and Correlation)
Similarity refers to the measure of resemblance or likeness between two objects or entities, while correlation measures the degree of relationship between two variables.
-
Simple Knowledge Organization System (SKOS)
A standard vocabulary for representing knowledge organization systems, providing a model for expressing the semantics of concepts and relationships.
-
Speech Analytics
The process of analyzing spoken language to extract insights, patterns, and actionable information from audio recordings or live speech.
-
Speech Recognition
The ability of a computer program or system to transcribe spoken words or phrases into text.
-
Stochastic Gradient Descent (SGD)
An optimization algorithm used to minimize the loss function by randomly selecting a subset of training examples at each iteration to update the model parameters.
-
Structured Data
Data that is organized in a fixed format or schema, typically stored in databases or structured files with well-defined rows and columns.
-
Supervised Learning
A machine learning approach where models are trained on labeled data, with input-output pairs provided during the training process to learn the mapping between inputs and outputs.
-
Symbolic Methodology
An approach to artificial intelligence that emphasizes the use of symbols, logic, and rules to represent knowledge and perform reasoning.
-
Syntax
The set of rules governing the structure and arrangement of words in a language, including grammar, word order, and sentence structure.
-
Tagging
The process of assigning labels or tags to words, phrases, or documents to categorize or annotate them for analysis or organization.
-
Taxonomy
A hierarchical classification system used to organize and categorize concepts, topics, or entities based on their relationships and characteristics.
-
Test Set
A subset of data used to evaluate the performance of a machine learning model after it has been trained on a training set, helping to assess generalization and model quality.
-
Text Analytics
The process of extracting insights and meaningful information from unstructured text data, including text mining, natural language processing, and sentiment analysis.
-
Text Summarization
The process of generating concise and coherent summaries of longer texts while preserving the key information and meaning.
-
Thesauri
Collections of words grouped together as synonyms, related concepts, or hierarchical relationships, used to expand or refine search queries and improve information retrieval.
-
Time Series Analysis
The process of analyzing and modeling sequential data points collected over time to identify patterns, trends, or anomalies.
-
Tokens
Individual units of language, such as words, phrases, or symbols, that are extracted or processed as discrete elements in natural language processing tasks.
-
Training Set
A subset of data used to train a machine learning model, consisting of input-output pairs that the model learns from during the training process.
-
Transfer Learning
A machine learning technique where knowledge gained from training on one task or dataset is transferred and applied to a different but related task or dataset, often to improve performance or reduce the need for labeled data.
-
Transformer
A type of deep learning model architecture based on self-attention mechanisms, commonly used in natural language processing tasks such as translation and text generation.
-
Treemap
A visualization technique used to display hierarchical data structures as nested rectangles, with the area of each rectangle proportional to the data it represents.
-
Triplet Relations (Subject Action Object (SAO))
A syntactic structure used to represent relationships between entities in a sentence, consisting of a subject performing an action on an object, often used in natural language processing and knowledge representation.
-
Tuning (Model Tuning or Fine Tuning)
The process of adjusting the hyperparameters or parameters of a machine learning model to optimize its performance on a specific task or dataset.
-
Unbalanced Dataset
A dataset where the distribution of classes or categories is skewed, with some classes having significantly more samples than others, which can pose challenges for machine learning models.
-
Unstructured Data
Data that lacks a predefined data model or organization, often in the form of text, images, audio, or video, requiring specialized techniques for analysis and processing.
-
Unsupervised Learning
A machine learning approach where models learn patterns or structures from unlabeled data without explicit supervision, typically used for clustering, dimensionality reduction, or generative modeling.
-
Validation Set
A subset of data used to evaluate the performance of a machine learning model during training, often used to tune hyperparameters and assess generalization before testing on unseen data.
-
Variational Autoencoder (VAE)
A type of generative model that combines the principles of autoencoders and variational inference to learn a latent representation of data and generate new samples.
-
Variational Inference
A method used to approximate complex probability distributions by transforming them into simpler distributions that are easier to work with, commonly used in Bayesian inference and generative modeling.
-
Zero-Coding Machine Learning
An approach to machine learning that automates the process of model building and deployment without requiring manual coding or programming by users, often using graphical user interfaces or drag-and-drop tools.
-
Zero-shot Learning
A machine learning paradigm where models are trained to recognize classes or concepts they have not been explicitly exposed to during training, often using auxiliary information or transfer learning techniques.
Stay informed, stay inspired.
Subscribe to our newsletter.
Get curated weekly analysis of vital developments, ground-breaking innovations, and game-changing resources in AI & ML before everyone else. All in one place, all prepared by experts.