10% of the revenue is donated to covid19 relief foundation

Glossary Of Data Analytics, Data Science, Big Data and Digital Marketing Terminology and Jargon

Over 1,250 Terms Every Data Analyst, Data Scientist and Marketer Should Know
Top Of The Page

Whether you’re into data analytics, data science or digital marketing the complex world of data and analytics can be a mess of confusing terminology and jargon. DatasetsDB’s the most comprehensive glossary defines over 1,250 terms to help you decode the puzzle. Below you will find an alphabetical index of data analytics, big data, data science and digital marketing terms.


Accuracy Accuracy is a metric by which one can examine how good is the machine learning model. The accuracy is the ratio of correctly predicted classes to the total classes predicted. The formula to use is as following: Accuracy = (True Positive + True Negative) / (True Poistive + True Negatives + False Positives + False Negatives)
ACID Test A test applied to data for atomicity, consistency, isolation and durability.
Acquisition You can understand how people find your website using the Acquisition reports. The reports present data based on the source and medium of your users, along with other acquisition dimensions. There are dedicated reports for your paid traffic from Google AdWords, organic traffic from Google (if you have linked your Google Search Console account), traffic from social networks and traffic from custom campaign tags.
Actions An action is any activity a user takes when interacting with your website, application or product. This could include email signups, page views, downloads, video plays, purchases or any other activity that you would like to track.
Active Users The Real Time and Home reports show you how many people are currently viewing content on your website. Data is processed within a few seconds into the Real Time reports and you can view data for the previous 30 minutes. While the Active Users report (under ‘Audience’) tells you the number of unique users who performed sessions on your website within a certain number of days.
Active Pages When viewing the Real Time reports, Active Pages shows you the pages people are currently viewing on your website. When someone navigates to another page or closes their browser the page that was shown as active will be removed from the Real Time reports.
Active Pages When viewing the Real Time reports, Active Pages shows you the pages people are currently viewing on your website. When someone navigates to another page or closes their browser the page that was shown as active will be removed from the Real Time reports.
Activity-based cost (ABC) Activity-based cost (ABC) accounting procedures that can quantify the true profitability of different activities by identifying their actual costs.
Ad Hoc Query The ability to create a one-off, "on demand" report from BI or analytics software that answers a specific business question.
Ad hoc analytics Ad hoc analytics is a business intelligence process designed to answer a single or specific business question. The product of ad hoc analysis is typically a statistical model, analytic report, or other type of data summary.
Adam Optimization The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. It is used to compute adaptive learning rates for each parameter.
Adoption Adoption an individual’s decision to become a regular user of a product.
Advanced Analytics Advanced Analytics is the autonomous or semi-autonomous examination of data or content using sophisticated techniques and tools, typically beyond those of traditional business intelligence (BI). It is used to discover deeper insights, make predictions, or generate recommendations.
Advertising Advertising any paid form of nonpersonal presentation and promotion of ideas, goods, or services by an identified sponsor.
Advertising objective Advertising objective a specific communications task and achievement level to be accomplished with a specific audience in a specific period of time.
Aggregation Data aggregation is a type of data and information mining process where data is searched, gathered and presented in a report-based, summarized format to achieve specific business objectives or processes and/or conduct human analysis.
Aggregate data Aggregate data formed or calculated by the combination of many separate units or items.
Agile Agile A methodology cribbed from software development that now sees application in many areas of business. Agile aims to help teams respond to unpredictability through incremental, iterative work cadences and shortened feedback loops (e.g. using short, daily meetings where project workers describe what they are working on.) Agile methodologies are an alternative to waterfall, or traditional sequential development.
Alexa Rank Your website’s Alexa Rank is calculated by the average number of page views and visitors your site receives within a 90-day period. By monitoring your Alexa Rank, you can determine how your website traffic is comparing against competition and other leaders in your industry.
Algorithm In computer science and mathematics, an algorithm is an effective categorical specification of how to solve a complex problem and how to perform data analysis. It consists of multiple steps to apply operations on data in order to solve a particular problem.
All Interactions Interaction is a kind of action that occur as two or more objects have an effect upon one another. The idea of a two-way effect is essential in the concept of interaction, as opposed to a one-way causal effect.
Alpha Risk Alpha risk is defined as the risk of rejecting the null hypothesis when in fact it is true. Synonymous with: Type I error, Producers Risk. In other words, stating a difference exists where actually there is none. Alpha risk is stated in terms of probability (such as 0.05 or 5%).
Alternative Hypothesis (Ha) Statement of a change or difference; assumed to be true if the null hypothesis is rejected.
Analytic Database An analytic database is a database management system that is optimized for business analytics applications and services. It is specifically designed to support business intelligence (BI) and analytic applications, typically as part of a data warehouse or data mart.
Analysis Analysis is a branch of mathematics which studies continuous changes and includes the theories of integration, differentiation, measure, limits, analytic functions and infinite series. It is the systematic study of real and complex-valued continuous functions. It describes both the discipline of which calculus is a part and one form of the abstract logic theory.
Analytics Analytics is the scientific process of discovering and communicating the meaningful patterns which can be found in data.
Analysis Services Analysis Services Also known as Microsoft SQL Server Analysis Services, SASS, and sometimes MSAS. Analysis Services is an online analytical data engine used in decision support and business analytics. It provides the analytical data for business reports and client applications such as Power BI, Excel, Reporting Services reports, and other data visualization tools. Analysis Services are used by organizations to analyze and make sense of information that could be spread out across multiple databases, or in disparate tables or files.
Anchoring and adjustment heuristic Anchoring and adjustment heuristic when consumers arrive at an initial judgment and then make adjustments of their first impressions based on additional information.
Annotations These are notes on specific parts of a chart in Google Analytics to help you better keep track of certain things that happen in your marketing. Usually, these are left on parts of a graph that are outliers. All people using your Google Analytics account can review annotations at any time.
Anomaly Detection Anomaly detection is the identification of data points, items, observations or events that do not conform to the expected pattern of a given group. These anomalies occur very infrequently but may signify a large and significant threat such as cyber intrusions or fraud.
Anonymization Making data anonymous; severing of links between people in a database and their records to prevent the discovery of the source of the records.
ANOVA One-way ANOVA is a generalization of the 2-sample t-test, used to compare the means of more than two samples to each other.
ANOVA Table The ANOVA table is the standard method of organizing the many calculations necessary for conducting an analysis of variance.
Apache Spark Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.
API (Application Program Interface) An application programming interface (API) is a set of protocols, routines, functions and/or commands that programmers use to develop software or facilitate interaction between distinct systems. APIs are available for both desktop and mobile use, and are typically useful for programming GUI (graphic user interface) components, as well as allowing a software program to request and accommodate services from another program.
 Application Application software is a program or group of programs designed for end users. These programs are divided into two classes: system software and application software. While system software consists of low-level programs that interact with computers at a basic level, application software resides above system software and includes applications such as database programs, word processors and spreadsheets. Application software may be bundled with system software or published alone.
Application Programming Interface (API) An application program interface (API) is code that allows two software programs to communicate with each other. An API defines the correct way for a developer to request services from an operating system (OS) or other application and expose data within different contexts and across multiple channels. In the early days of Web 2.0, the concept of integrating data and applications from different sources was called a mashup.
Arm’s-length price Arm’s-length price the price charged by other competitors for the same or a similar product.
Artificial Intelligence (AI) Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction. Particular applications of AI include expert systems, speech recognition and machine vision.
Artificial general intelligence (AGI) Artificial general intelligence (AGI) also known as strong AI, AGI is a type of artificial intelligence that is considered human-like, and still in its preliminary stages (more of a hypothetical existence in present day)
Artificial narrow intelligence (ANI) Artificial narrow intelligence (ANI) also known as weak AI, ANI is a type of artificial intelligence that can only focus on one task or problem at a given time (e.g. playing a game against a human competitor). This is the current existing form of AI.
Artificial neural network (ANN) Artificial neural network (ANN) a network modeled after the human brain by creating an artificial neural system via a pattern-recognizing computer algorithm that learns from, interprets, and classifies sensory data
 Aspirational Aspirational groups groups a person hopes or would like to join.
Assisted Conversion Inside the 'Multi-Channel Funnels' reports you will find assisted conversions which show you the channels which later led to a conversion. For example, if a user came to the website from Twitter and then later from Google AdWords, Twitter would be counted as an ‘assisted conversion’. The reports also allow you to view assisted conversions based on other dimensions, including campaign, source, medium, landing page and more.
Associative Associative network memory model a conceptual representation that views memory as consisting of a set of nodes and interconnecting links where nodes represent stored information or concepts and links represent the strength of association between this information or concepts.
Attribution Model An attribution model tells your analytics program how you want to weigh the importance of different touchpoints. For example, if you want every single page to be given equal weight (or credit) for the conversion, you will choose an All Interactions Model for HubSpot and a Linear Model for Google Analytics. If you want only the first page a visitor ever saw before they ultimately converted, you would choose a First Interaction (or First Touch) model.
Attribution Report An attribution report is used to understand the journey someone takes from the first time they set foot on your website to the time they become a customer -- basically, measuring the conversion path to see what made someone convert.
Attribution Attribution links an established set of user actions to a desired result and then assigns a value to each of those outcomes. Marketers use attribution models and reporting to understand how their marketing activities are performing relative to business results. For example, if my goal was to learn if the blog posts I’m publishing are delivering qualified leads that convert, I could set up an attribution report to monitor how many people read my blog posts, sign up for a free trial and then convert to a paying customer.
 Attitude Attitude a person’s enduring favorable or unfavorable evaluation, emotional feeling, and action tendencies toward some object or idea.
 Audiences You can configure custom audiences to see more granular metrics inside your reports. For example, if you’re considering running a remarketing campaign you can create an audience to monitor current performance before you begin advertising. You can find the Audiences report under ‘Audience’.
Augmented product Augmented product a product that includes features that go beyond consumer expectations and differentiate the product from competitors.
 Autoregression Autoregressive models and processes are stochastic calculations in which future values are estimated based on a weighted sum of past values. In this technique input variables are taken as observations at previous time steps, called lag variables. X(t+1) = b0 + b1*X(t-1) + b2*X(t-2)
Automatic Identification and Data Capture (AIDC) Automatic identification and data capture (AIDC) refers to the method used to identify objects through computing algorithms. For example, bar codes, radio-frequency identification (RFID), biometrics, magnetic strips, optical character recognition (OCR), smart cards and voice recognition technologies all include algorithms identifying objects captured using a still image capturing system, audio or video.
Automation An automation is a triggered response to a user’s activity or behavior. For example, if a visitor to your site downloaded an eBook, the automation that triggers could be to add them to an email campaign or create a lead in your CRM.
Available market Available market the set of consumers who have interest, income, and access to a particular offer.
Availability heuristic Availability heuristic when consumers base their predictions on the quickness and ease with which a particular example of an outcome comes to mind.
Average Session Duration Provides a top-level view of how long users are spending on your website. For example, if you had two users, one that spent three minutes on your website and another that spent one minute, then you would have an average session duration of two minutes. Google Analytics does not count time for the last page viewed during a session. This means that the average session duration will tend to be skewed lower than the actual amount of time people are spending on your website.
Average cost Average cost the cost per unit at a given level of production; it is equal to total costs divided by production.
Avro Avro is an open source project that provides data serialization and data exchange services for Apache Hadoop. These services can be used together or independently. Avro facilitates the exchange of big data between programs written in any language. With the serialization service, programs can efficiently serialize data into files or into messages. The data storage is compact and efficient. Avro stores both the data definition and the data together in one message or file.


Top of the Page
 Backpropagation Backpropagation is a supervised learning algorithm, for training Multi-layer Perceptrons. it is essentially a principal that allows the machine learning program to adjust itself according to looking at its past function. Backpropagation is sometimes called the “backpropagation of errors.
Back End The back end is all of the code and technology that works behind the scenes to populate the front end with useful information. This includes databases, servers, authentication procedures, and much more. You can think of the back end as the frame, the plumbing, and the wiring of an apartment.
Backward invention Backward invention reintroducing earlier product forms that can be well adapted to a foreign country’s needs.
Bagging "Bagging" or bootstrap aggregation is a specific type of machine learning process that uses ensemble learning to evolve machine learning models. This technique uses specific groups of training sets where some observations may be repeated between different training sets. Some of the algorithms that use bagging technique are : # Bagging meta-estimator # Random Forest
Balanced scorecard Balanced scorecard a performance management tool that holistically captures an organization’s performance from several vantage points (e.g. sales results vs. inventory levels) on a single page.
Banner ads (Internet) Banner ads (Internet) small, rectangular boxes containing text and perhaps a picture to support a brand.
 Bar Chart Bar charts are a type of graph that are used to display and compare the number, frequency or other measure (e.g. mean) for different discrete categories of data.
Batch Processing Batch data processing is an efficient way of processing high volumes of data where a group of transactions is collected over a period of time. Hadoop is focused on batch data processing.
Bayes Theorem Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates. For example, Let’s say a clinic wants to cure cancer of the patients visiting the clinic. A represents an event “Person has cancer” B represents an event “Person is a smoker”
Bayesian Statistics Bayesian statistics is a system for describing epistemological uncertainty using the mathematical language of probability. In the 'Bayesian paradigm,' degrees of belief in states of nature are specified; these are non-negative, and the total belief in all states of nature is fixed to be one. Bayesian statistical methods start with existing 'prior' beliefs, and update these using data to give 'posterior' beliefs, which may be used as the basis for inferential decisions.
Bayesian networks Bayesian networks also known as Bayes network, Bayes model, belief network, and decision network, is a graph-based model representing a set of variables and their dependencies.
Behavioral Analytics Behavioral analytics are a subset of business analytics which focus on finding out how and why people behave the way they do when using eCommerce platforms, social media sites, online games, and any other web application. Behavioral analytics take business analytics’ broad focus and narrows it down, which allows one to take what seem to be unrelated data points and then extrapolate, determine errors, and predict future trends. All of this is done through data exhaust that has been generated by users.
Behavioral Targeting Behavioral targeting is an advertising technique that monitors the behavior of a user to deliver more personalized, relevant messaging. If a marketer is collecting the right data, they can use behavioral targeting to engage customers and prospects based on a variety of information including purchase history, browsing activity, in-app activity and more to deliver truly customized messaging.
Beta Risk The risk or probability of making a Type II error.
BI application designer BI application designer Someone responsible for designing the initial reporting templates and dashboards in the front-end applications. They generally require a combined enthusiasm for data visualization, user experience design, and applications reporting. Typically, BI application designers become the source for ongoing front-end BI application support.
BI Project Sponsor BI Project Sponsor Ideally, a project sponsor is an executive level individual who understands the importance of BI projects, has compelling business motivation, and can help drive results. This person will be the project’s ultimate client and its strongest advocate. Not involved in the day to day of a project, but instead, they provide oversight, direction, and momentum.
Bias-Variance Trade-off

Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.

Variance is the variability of model prediction for a given data point or a value which tells us spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

Big Data Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques. In most enterprise scenarios the volume of data is too big or it moves too fast or it exceeds current processing capacity.
Big Data Scientist
Big Data Scientist is a person who can take structured and unstructured data points and use his formidable skills in statistics, maths, and programming to organize them. He applies all his analytical power (contextual understanding, industry knowledge, and understanding of existing assumptions) to uncover the hidden solutions for the business development.
Binary Variable Binary variables are variables which only take two values. For example, Male or Female, True or False and Yes or No
Binomial Distribution Binomial distribution, in mathematics and statistics, is the probability of a particular outcome in a series when the outcome has two distinct possibilities, success or failure. For example, a coin toss has only two possible outcomes: heads or tails and taking a test could have two possible outcomes: pass or fail.
Biometrics Biometrics is the measurement and statistical analysis of people's unique physical and behavioral characteristics. The technology is mainly used for identification and access control, or for identifying individuals who are under surveillance. The basic premise of biometric authentication is that every person can be accurately identified by his or her intrinsic physical or behavioral traits. The term biometrics is derived from the Greek words bio meaning life and metric meaning to measure.
Black Box Algorithms A black box algorithm is where the result of, or process behind, an algorithm’s decision making is unknown, not understood, or difficult to explain.
Boosting Boosting is a machine learning ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones. Some of the boosting algorithms are: AdaBoost GBM XGBM LightGBM CatBoost
Bootstrapping Bootstrapping is the process of dividing the dataset into multiple subsets, with replacement. Each subset is of the same size of the dataset. These samples are called bootstrap samples.
Bounce Rate The bounce rate is the percentage of people who landed on your website, but instead of browsing further, they exited your website. The percentage of people who land on one of your web pages and then leave without clicking to anywhere else on your website -- in other words, single-page visitors. This metric is found in Google Analytics.
Bounce A bounce is reported when a user’s session only contains a single pageview. The idea is that someone comes to your website and they ‘bounce’ away and leave after only viewing a single page.
Box Plot A Box Plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum. In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median. The whiskers go from each quartile to the minimum or maximum.
Brand Brand a name, term, sign, symbol, or design, or a combination of them, intended to identify the goods or services of one seller or group of sellers and to differentiate them from those of competitors.
Brand architecture Brand architecture see branding strategy.
Brand-asset management team (BAMT) Brand-asset management team (BAMT) key representatives from functions that affect the brand’s performance.
Brand Associations Brand Associations all brand-related thoughts, feelings, perceptions, images, experiences, beliefs, attitudes, and so on that become linked to the brand node.
Brand audit Brand audit a consumer-focused exercise that involves a series of procedures to assess the health of the brand, uncover its sources of brand equity, and suggest ways to improve and leverage its equity.
Brand awareness Brand awareness consumers’ ability to identify the brand under different conditions, as reflected by their brand recognition or recall performance.
Brand community Brand community a specialized community of consumers and employees whose identification and activities focus around the brand.
Brand contact Brand contact any information-bearing experience a customer or prospect has with the brand, the product category, or the market that relates to the marketer’s product or service.
Brand Development Index (BDI) Brand Development Index (BDI) the index of brand sales to category sales.
Brand dilution Brand dilution when consumers no longer associate a brand with a specific product or highly similar products or start thinking less favorably about the brand.
Brand elements Brand Elements those trademarkable devices that serve to identify and differentiate the brand such as a brand name, logo, or character.
Brand equity Brand equity the added value endowed to products and services.
Brand extension Brand extension a company’s use of an established brand to introduce a new product.
Brand image Brand image the perceptions and beliefs held by consumers, as reflected in the associations held in consumer memory.
Brand knowledge Brand knowledge all the thoughts, feelings, images, experiences, beliefs, and so on that become associated with the brand.
Brand line Brand line all products, original as well as line and category extensions, sold under a particular brand name.
Brand mix Brand mix the set of all brand lines that a particular seller makes available to buyers.
Brand personality Brand personality the specific mix of human traits that may be attributed to a particular brand.
Brand portfolio Brand portfolio the set of all brands and brand lines a particular firm offers for sale to buyers in a particular category.
Brand promise Brand promise the marketer’s vision of what the brand must be and do for consumers.
Brand-tracking studies Brand-tracking studies collect quantitative data from consumers over time to provide consistent, baseline information about how brands and marketing program are performing.
Brand Valuation Brand Valuation an estimate of the total financial value of the brand.
Brand value Brand value chain a structured approach to assessing the sources and outcomes of brand equity and the manner in which marketing activities create brand value.
Branded Branded entertainment using sports, music, arts, or other entertainment activities to build brand equity.
Branded variants Branded variants specific brand lines uniquely supplied to different retailers or distribution channels.
Branding Branding endowing products and services with the power of a brand.
Branding Strategy Branding Strategy the number and nature of common and distinctive brand elements applied to the different products sold by the firm.
Breakeven Analysis Breakeven Analysis a means by which management estimates how many units of the product the company would have to sell to break even with the given price and cost structure.
Brick-and-click Brick-and-click existing companies that have added an online site for information and/or e-commerce.
Business Analytics Business analytics refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning
Business Intelligence Business intelligence is a technological term that overlooks data, computing, and analytics within business operations. Much more than a specific “thing,” business intelligence is rather an umbrella term that covers the processes and methods of collecting, storing, and analyzing data from business operations or activities to optimize performance.
Business analyst Business Analyst Someone who analyzes an organization or business domain (real or hypothetical) and documents its processes and systems, assesses the business model, and determines integration with technology. Their solutions can be the use of technology architecture, tools, or software application. In a BI project, this person is often responsible for determining business needs and translating them into architectural data and application requirements.
Business Driver Business Driver This term can refer to either a resource, process, or condition that is essential to the growth and continued success of a business. For an example in terms of a BI project, when the sponsor is too far removed from the project team, a business driver is helpful. The driver typically becomes responsible for the less strategic BI responsibilities. This role is usually filled by a middle manager, but possesses the same characteristics as the sponsor.
Business lead Business Lead A mid-to-senior level resource who understands both the business and technical side of a company well enough to communicate between the two. Within the context of a BI project, they understand the requirements, obstacles, and issues of each to make decisions on various courses of action. Additionally, this person should be highly involved in a BI project – communicating with the project manager constantly. Sometimes the business driver fills this role.
Business database Business database complete information about business customers’ past purchases; past volumes, prices, and profits.
Business market Business market all the organizations that acquire goods and services used in the production of other products or services that are sold, rented, or supplied to others.


Top of the Page
Call Detail Record (CDR) Analysis CDR contains metadata i.e. data about data that a telecommunication company collects about phone calls such as length and time of the call. CDR analysis provides businesses the exact details about when, where, and how calls are made for billing and reporting purposes.
Calculated Metric Calculated metrics allow you to create your own metrics that are based on the default metrics available within your reports. For example, you can create your own calculated metric that divides goal completions by users to create a user goal conversion rate which is not the same as the default session-based goal conversion rate.
Campaign Name Campaign name is one of the four main dimensions (along with source, medium and channel) for reporting and analyzing marketing campaigns. The campaign name is provided when you use a campaign tagged URL for your inbound marketing or from your Google AdWords campaigns (when Google AdWords is linked to Google Analytics).
Campaign Tags Inbound marketing can be tracked and reported by Google Analytics using campaign tags. Extra details (query parameters) are added to the end of URLs which are then included in the Acquisition reports. Campaign tags include campaign name, source, medium, term and content.
Capital items Capital items long-lasting goods that facilitate developing or managing the finished product.
Captive products Captive products products that are necessary to the use of other products, such as razor blades or film.
Cascading Cascading is the term used to describe the method for allowing multiple participants to enter the conference beyond what a single multipoint control unit (MCU) can support. Cascading usually means to connect two separate MCUs, where the second MCU acts and is treated as other participants. The host MCU will send the joining MCU the processed data streams which are then distributed to its the videoconference participants.
Categorical variable In statistics, a categorical variable is a variable that can take on one of a limited, and usually fixed number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Category extension Category extension using the parent brand to brand a new product outside the product category currently served by the parent brand.
Categorical membership Category Membership the products or sets of products with which a brand competes and which function as close substitutes.
Cause-related marketing Marketing that links a firm’s contributions to a designated cause to customers’ engaging directly or indirectly in revenue-producing transactions with the firm.
Cell Phone data Cell phone data has surfaced as one of the big data sources as it generates a tremendous amount of data and much of it is available for use with analytical applications.
Channel grouping This Google analytics feature allows you to group marketing activities together. Using the Acquisition Reports, by default, you can view and compare metrics by channel name, traffic source, medium, or campaign name. You also can set up custom channel groups.
Change history You can view changes made to your Google Analytics account, properties and views by navigating to ‘Admin’ and selecting ‘Change History’. You can see the email address of the person who made the change along with a short description. Changes made by people who have been removed from Google Analytics will be listed as ‘Deleted User’.
Channel Channels provide top-level groupings of your inbound marketing. Each channel combines source and medium so you can understand overall performance. For example, the default channel grouping includes ‘Organic Search’, ‘Paid Search’, ‘Social’ and ‘Email’ which automatically combines pre-defined sources and mediums. You can also configure your own custom channel groupings.
Chatbots Chatbots a chat robot that can converse with a human user through text or voice commands. Utilized by e-commerce, education, health, and business industries for ease of communication and to answer user questions.
Channel conflict Channel Conflict when one channel member’s actions prevent the channel from achieving its goal.
Channel coordination Channel Coordination when channel members are brought together to advance the goals of the channel, as opposed to their own potentially incompatible goals.
Channel power Channel power the ability to alter channel members’ behavior so that they take actions they would not have taken otherwise.
Chuckwa Apache Chukwa is an open source data collection system for monitoring large distributed systems. Apache Chukwa is built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework and inherits Hadoop's scalability and robustness.
Churn An organization’s churn is calculated by the total number of subscribers or customers that leave throughout time. There are multiple ways to calculate churn depending on your business model. In a SaaS company for example, if a customer comes on board with a one-year subscription and later leaves, they will have “churned.” The goal is to keep your churn rate as low as possible.
Classifitopcation Threshold Classification threshold is the value which is used to classify a new observation as 1 or 0. When we get an output as probabilities and have to classify them into classes, we decide some threshold value and if the probability is above that threshold value we classify it as 1, and 0 otherwise. To find the optimal threshold value, one can plot the AUC-ROC and keep changing the threshold value. The value which will give the maximum AUC will be the optimal threshold value.
Classification Analysis A systematic process for obtaining important and relevant information about data (metadata) and assigning data to a particular group or class.
Clickstream Analytics Clickstream analysis is the process of looking at clickstream data for market research or other purposes. A clickstream is a rendering of user activity on a website, namely, where a user clicks on a computer display screen and how that movement translates to other Web activity.
Client ID Google Analytics uses a unique identifier, called ‘Client ID’ to report and analyze the behavior of individuals on your website. By default, the identifier is randomly assigned and is stored in a browser cookie on the users’ device.
Cloud Computing Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that's often used to represent the Internet in flowcharts and diagrams.
Cloud A broad term that refers to any internet-based application or service that is hosted remotely
Cloud computing Cloud computing is the the use of various services, such as software development platforms, servers, storage and software, over the internet, often referred to as the "cloud."
Cloud Analytics With cloud analytics, you can do data analytics in a public or private cloud, typically through a Software as a Service (SaaS) model or alternatively by hosting a data warehouse (Platform as a Service – PaaS) in the cloud on which you can run your BI, analytic and reporting software.
Cloud computing Well, cloud computing has become ubiquitous so it may not be needed here but I included just for completeness sake. It’s essentially software and/or data hosted and running on remote servers and accessible from anywhere on the internet.
Clustering Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups.
Cluster Analysis Cluster analysis is the big data term related to the process of the grouping of objects similar to each other in the common group (cluster). It is done to understand the similarities and differences between them. It is the important task of exploratory data mining, and common strategies to analyze statistical data in various fields such as image analysis, pattern recognition, machine learning, computer graphics, data compression and so on.
Cluster Computing A computer cluster is a single logical unit consisting of multiple computers that are linked through a LAN. The networked computers essentially act as a single, much more powerful machine. A computer cluster provides much faster processing speed, larger storage capacity, better data integrity, superior reliability and wider availability of resources.
Clustering Analysis Cluster analysis is a statistical classification technique in which a set of objects or points with similar characteristics are grouped together in clusters. It encompasses a number of different algorithms and methods that are all used for grouping objects of similar kinds into respective categories. The aim of cluster analysis is to organize observed data into meaningful structures in order to gain further insight from them.
Clustering Clustering algorithm technique that allows machines to group similar data into larger data categories.
Cluster computing It’s a fancy term for computing using a ‘cluster’ of pooled resources of multiple servers. Getting more technical, we might be talking about nodes, cluster management layer, load balancing, and parallel processing, etc.
Club Membership Programs Club Membership Programs programs open to everyone who purchases a product or service, or limited to an affinity group of those willing to pay a small fee.
Co-branding Co-branding (also dual branding or brand bundling) two or more well-known brands are combined into a joint product or marketed together in some fashion.
Coefficient of Variation The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data series around the mean. The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a useful statistic for comparing the degree of variation from one data series to another, even if the means are drastically different from one another.
Cognitive computing Cognitive computing computerized model that mimics human thought processes by data mining, NLP, and pattern recognition.
Cohort Analysis The Cohort Analysis report shows you users segmented by date. For example, you can use the report to see when users are acquired and when they return to your website.
Cohorts Cohorts groups of individuals born during the same time period who travel through life together.
Columnar Database A columnar database is a database management system (DBMS) that stores data in columns instead of rows. The goal of a columnar database is to efficiently write and read data to and from hard disk storage in order to speed up the time it takes to return a query.
Columnar Database or Column-oriented Database A database that stores data by column rather than by row. In a row-based database, a row might contain a name, address and phone number. In a column-oriented database, all names are in one column, addresses in another and so on. A key advantage of a columnar database is faster hard disk access.
Collaborative Business Intelligence, or Collaborative BI Collaborative Business Intelligence, or Collaborative BI Is the marriage of traditional business intelligence tactics with tools like social networking, wikis, or blogs, to enhance the collaborative problem-solving nature of BI. Microsoft SharePoint is an example of a popular collaborative BI product.
Computer Vision Computer vision is a field of computer science that works on enabling computers to see, identify and process images in the same way that human vision does, and then provide appropriate output. It is like imparting human intelligence and instincts to a computer. In reality though, it is a difficult task to enable computers to recognize images of different objects.
Comparative Analytical-oriented Database Comparative analytic is a special type of data mining technology which compares large data sets, multiple processes or other objects using statistical strategies such as filtering, decision tree analytics, pattern analysis etc.
Complex Event Processing (CEP) Complex Event Processing (CEP) is the process of analyzing and identifying data and then combining it to infer events that are able to suggest solutions to the complex circumstances. The main task of CEP is to identify/track meaningful events and react to them as soon as possible.
Comparative Analysis Describe comparative analysis as comparison analysis. Use comparison analysis to measure the financial relationships between variables over two or more reporting periods. Businesses use comparative analysis as a way to identify their competitive positions and operating results over a defined period.
Companies Properties Companies Properties in HubSpot contain important information about the contact's company, such as the company name and the website URL. You'll find these properties in individual Contact records and in the Companies Report.
Computer vision Computer vision when a machine processes visual input from image files (JPEGs) or camera feeds.
Computer Vision A simple way to think of this is teaching machines how to visually interpret the world (teaching them to ‘see’). Perhaps a better way to describe computer vision is as the field of AI and computer science that deals with how machines can gain a level of understanding from images or videos.
Communication Adaptation Communication Adaptation changing marketing communications programs for each local market.
Communication-effect research Communication-effect research determining whether an ad is communicating effectively.
Company demand Company demand the company’s estimated share of market demand at alternative levels of company marketing effort in a given time period.
Company sales forecast Company sales forecast the expected level of company sales based on a chosen marketing plan and an assumed marketing environment.
Competitive advantage Competitive advantage a company’s ability to perform in one or more ways that competitors cannot or will not match.
Company sales potential Company sales potential the sales limit approached by company demand as company marketing effort increases relative to that of competitors.
Concordant-Discordant Ratio Concordant and discordant pairs are used to describe the relationship between pairs of observations. To calculate the concordant and discordant pairs, the data are treated as ordinal. The number of concordant and discordant pairs are used in calculations for Kendall’s tau, which measures the association between two ordinal variables.
Confidence Interval A confidence interval is an interval that will contain a population parameter a specified proportion of the time. The confidence interval can take any number of probabilities, with the most common being 95% or 99%.
Confusion Matrix In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.
Continuous Variable In mathematics, a variable may be continuous or discrete. If it can take on two particular real values such that it can also take on all real values between them, the variable is continuous in that interval.
Convergence Convergence refers to moving towards union or uniformity. An iterative algorithm is said to converge when as the iterations proceed the output gets closer and closer to a specific value.
Convex Function In mathematics, a real-valued function defined on an n-dimensional interval is called convex if the line segment between any two points on the graph of the function lies above or on the graph. Equivalently, a function is convex if its epigraph is a convex set.
Continuous Data Data from a measurement scale that can be divided into finer and finer increments (e.g. temperature, time, pressure). Also known as variable data.
Contact-to-Customer Conversion Rate To put it simply, your conversion rate is the percentage of visitors to your website or landing page that convert (aka, do what you want them to do). Depending on your business goals, a “conversion” could be almost anything.
Contacts A contact is someone who has submitted their information in a form on your website. Contacts can be in different Lifecycle Stages such as lead, marketing qualfied lead, customer, and evangelist. The term contacts can be found in HubSpot.
Contacts Properties Contacts Properties in HubSpot contain important information about the individual, such as the contact's name, email, and address.
Content Grouping The ability to view and compare metrics that have been aggregated into a group. You can analyze the group's aggregated data, individual URLs, page titles, or screen names in Google Analytics.
Conversion Rate The number of people who converted on your website (typically filling out a form or another action you have predefined) divided by the number of people who visited your website.
Conversion Type This option will let you define what a conversion is in the report you're running. For example, in HubSpot, you could select "Became a Lead Date" to figure out when your visitor turned into a lead. You could select "Became a Customer Date" to figure out when your lead turned into a customer.
Conversion Rate Conversion rate is calculated when users take a particular, defined action that you want them to take. The most common measurement for conversion is the number of unique website visitors that convert to paying customers. However, conversions could be the number of people who sign up for a free trial, make a purchase, download a white paper, click on an ad, etc. Whatever your goal is, the conversion rate is what you will be measuring and optimizing to deliver the best customer experience possible.
Content Group You can configure content groups to classify each page of your website into a particular category. This allows you to perform top-level reporting and analysis on your pages based on your own content classifications. You can create content groups by modifying your tracking code, by extracting details from your pages or by creating rules.
Conversion A conversion is reported whenever a user completes a goal or makes a purchase during a session. Each goal will report a maximum of one conversion per session, while every transaction is reported.
Contextual data Contextual data a structuring of big data that attaches situational contexts to single elements of big data to enrich them with business meaning
Convolutional neural network (CNN) Convolutional neural network (CNN) a type of neural network specifically created for analyzing, classifying, and clustering visual imagery by using multilayer perceptrons.
Consciousness Difficult to define but undeniable when it’s present, consciousness is widely regarded as the state of subjective experience – the state or quality of awareness.
Conformance quality Conformance quality the degree to which all the produced units are identical and meet the promised specifications.
Conjoint analysis Conjoint analysis a method for deriving the utility values that consumers attach to varying levels of a product’s attributes.
Conjunctive heuristic Conjunctive heuristic the consumer sets a minimum acceptable cutoff level for each attribute and chooses the first alternative that meets the minimum standard for all attributes.
Consumer behavior Consumer behavior the study of how individuals, groups, and organizations elect, buy, use, and dispose of goods, services, ideas, or experiences to satisfy their needs and wants.
Consumer involvement Consumer involvement the level of engagement and active processing undertaken by the consumer in responding to a marketing stimulus.
Consumerist movement Consumerist movement an organized movement of citizens and government to strengthen the rights and powers of buyers in relation to sellers.
Consumption system Consumption system the way the user performs the tasks of getting and using products and related services.
Containerization Containerization putting the goods in boxes or trailers that are easy to transfer between two transportation modes.
Contractual sales force Contractual sales force manufacturers’ reps, sales agents, and brokers, who are paid a commission based on sales.
Convenience goods Convenience goods goods the consumer purchases frequently, im- mediately, and with a minimum of effort.
Cookie A cookie is a piece of information that is stored in a website browser. Google Analytics uses cookies to identify users. If someone does not have an existing cookie, then a new cookie will be created and they will appear as a new user in your reports. If someone has an existing cookie, then they will be reported as a returning user and the cookie expiration will be updated.
Correlation Correlation is a statistical measure that indicates the extent to which two or more variables fluctuate together. A positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases.
Correlation Analysis A means to determine a statistical relationship between variables, often for the purpose of identifying predictive factors among the variables. A technique for quantifying the strength of the linear relationship between two variables.
Core benefit Core benefit the service or benefit the customer is really buying.
Core Competency Core Competency attribute that (1) is a source of competitive ad- vantage in that it makes a significant contribution to perceived cus- tomer benefits, (2) has applications in a wide variety of markets, (3) is difficult for competitors to imitate.
Core values Core values the belief systems that underlie consumer attitudes and behavior, and that determine people’s choices and desires over the long term.
Corporate culture Corporate culture the shared experiences, stories, beliefs, and norms that characterize an organization.
Corporate retailing Corporate retailing corporately owned retailing outlets that achieve economies of scale, greater purchasing power, wider brand recognition, and better-trained employees.
Cosine Similarity Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.
Cost Function A cost function is a mathematical formula used to used to chart how production expenses will change at different output levels. In other words, it estimates the total cost of production given a specific quantity produced.
Cost Analysis After uploading third-party advertising data (see Data Import) you can then compare the performance of your advertising based on a range of metrics including; click-through rate, cost-per-click, revenue-per-click, and return on advertising spend.
Countertrade Countertrade offering other items in payment for purchases.
Covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive.
CPC Cost-per-click or CPC can be seen in the Acquisition reports and typically refers to people clicking through to your website from paid ads. This includes traffic from linked Google AdWords accounts and campaign tagged URLs where the medium has been defined as ‘cpc’ or ‘paid’.
Criteria Criteria a principle or standard by which something may be judged or decided.
Critical path scheduling (PS) Critical path scheduling (PS) network planning techniques to co- ordinate the many tasks in launching a new product.
CRM Customer relationship management, denoting strategies and software that enable a company to optimize its customer relations.
Cross Entropy Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually the "true" distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.
Cross Validation Cross-validation, it’s a model validation techniques for assessing how the results of a statistical analysis (model) will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
Cross Device The Cross Device reports provide insights into people who are using multiple devices to visit your website. The automated Cross Device reports require Google signals to be enabled. These reports provide insights based on aggregated and anonymized data from people logged into their Google account. You can also send identifiers to Google Analytics, which allow you to make use of the Cross Device reports with user ID.
Cube Cube Multi-dimensional sections of data built from tables and fields in your database. Cubes contain calculations and formulae and are often grouped around specific business functions such as sales, finance, purchasing, inventory, etc. Each cube contains contextual, pertinent, and useful metrics for that particular area of the business.
Cues Cues stimuli that determine when, where, and how a person responds.
Custom Dimension / Custom Metric In addition to the default dimensions and metrics, Google Analytics can be configured to collect additional data and make it available in your reports. For example, you could configure a custom dimension to report the authors of each page on your website, to understand performance based on who is creating content.
Custom Segment Apart from the default (or system) segments, you can also create custom segments to filter the data that is (or is not) included in your reports. Segments can be configured to focus on particular sections of your traffic based on users and sessions. For example, you can create a custom segment to perform detailed analysis on your top-performing customers to understand how they’re engaging with your website.
Customer-based Brand equity Customer-based Brand equity the differential effect that brand knowledge has on a consumer response to the marketing of that brand.
Customer Consulting data Customer Consulting data, information systems, and advice services that the seller offers to buyers.
Customer Database Customer Database an organized collection of comprehensive in- formation about individual customers or prospects that is current, accessible, and actionable for marketing purposes.
Customer equity Customer equity the sum of lifetime values of all customers.
Customer Lifetime Value (CLV) Customer lifetime Value (CLV) the net present value of the stream of future profits expected over the customer’s lifetime purchases.
Customer mailing list Customer mailing list a set of names, addresses, and telephone numbers.
Customer-management Customer-management organization deals with individual customers rather than the mass market or even market segments.
Customer Perceived Value (CPV) Customer Perceived Value (CPV) the difference between the prospective customer’s evaluation of all the benefits and all the costs of an offering and the perceived alternatives.
Customer-Performance Customer-Performance scorecard how well the company is doing year after year on particular customer-based measures.
Customer Profitability Analysis (CPA) Customer Profitability Analysis (CPA) a means of assessing and ranking customer profitability through accounting techniques such as activity-based costing (ABC).
Customer Relationship Management (CRM) Customer Relationship Management (CRM) the process of care- fully managing detailed information about individual customers and all customer “touch points” to maximize loyalty.
Customer Training Customer Training training the customer’s employees to use the vendor’s equipment properly and efficiently.
Customer Value Analysis Customer Value Analysis report of the company’s strengths and weaknesses relative to various competitors.
Customer-value Hierarchy Customer-value Hierarchy five product levels that must be ad- dressed by marketers in planning a market offering.
Customerization Customerization combination of operationally driven mass customization with customized marketing in a way that empowers consumers to design the product and service offering of their choice.
Cyborg Officially ‘Cybernetic Organism’ but usually used to refer to a hybrid of human and machine.


Top of the Page
Dark Data Dark data is a type of unstructured, untagged and untapped data that is found in data repositories and has not been analyzed or processed. It is similar to big data but differs in how it is mostly neglected by business and IT administrators in terms of its value.
Dashboard Dashboard is an information management tool which is used to visually track, analyze and display key performance indicators, metrics and key data points. Dashboards can be customised to fulfil the requirements of a project. It can be used to connect files, attachments, services and APIs which is displayed in the form of tables, line charts, bar charts and gauges. Popular tools for building dashboards include Excel and Tableau.
Data In computing, data is information that has been translated into a form that is efficient for movement or processing. Relative to today's computers and transmission media, data is information converted into binary digital form. It is acceptable for data to be used as a singular subject or a plural subject. Raw data is a term used to describe data in its most basic digital format.
Data Aggregation Data aggregation is a type of data and information mining process where data is searched, gathered and presented in a report-based, summarized format to achieve specific business objectives or processes and/or conduct human analysis
Data Analysis Expressions (DAX) Data Analysis Expressions (DAX) Provides a specialized syntax for querying Analysis Services. DAX includes some of the functions that are used in Excel formulas as well as additional functions that are designed to work with relational data and perform dynamic aggregation. DAX can compute values for seven different data types: Integer, Real, Currency, Date, Boolean, String and BLOB (binary large object).
Data Analytics Data analytics is the science of analyzing raw data in order to make conclusions about that information. Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption.
Data Architect A data architect is an individual who is responsible for designing, creating, deploying and managing an organization's data architecture. Data architects define how the data will be stored, consumed, integrated and managed by different data entities and IT systems, as well as any applications using or processing that data in some way.
Data Architecture Data architecture is a set of rules, policies, standards and models that govern and define the type of data collected and how it is used, stored, managed and integrated within an organization and its database systems. It provides a formal approach to creating and managing the flow of data and how it is processed across an organization’s IT systems and applications.
Data as a Service (DaaS) Data as a service (DaaS) is a cloud strategy used to facilitate the accessibility of business-critical data in a well-timed, protected and affordable manner. DaaS depends on the principle that specified, useful data can be supplied to users on demand, irrespective of any organizational or geographical separation between consumers and providers.
Data Broker A business that collects personal information about consumers and sells that information to other organizations.
Data Center Data centers are simply centralized locations where computing and networking equipment is concentrated for the purpose of collecting, storing, processing, distributing or allowing access to large amounts of data. They have existed in one form or another since the advent of computers.
Data Cleansing Data cleansing is the process of altering data in a given storage resource to make sure that it is accurate and correct. There are many ways to pursue data cleansing in various software and data storage architectures; most of them center on the careful review of data sets and the protocols associated with any particular data storage technology. Data cleansing is also known as data cleaning or data scrubbing.
Data Engineering Data engineering is all about the back end. These are the people that build systems to make it easy for data scientists to do their analysis. In smaller teams, a data scientist may also be a data engineer. In larger groups, engineers are able to focus solely on speeding up analysis and keeping a data well organized and easy to access.
Data Ethical Guidelines Guidelines that help organizations be transparent with the data, ensuring simplicity, security and privacy.
Data Exploration The part of the data science process where a scientist will ask basic questions that helps her understand the context of a data set. What you learn during the exploration phase will guide more in-depth analysis later. Further, it helps you recognize when a result might be surprising and warrant further investigation.
Data Feed A data feed is a mechanism for delivering data streams from a server to a client automatically or on demand. The data feed is usually a defined file format that the client application understands that contains timely information that may be useful to the application itself or to the user.
Data Governance Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. It establishes the processes and responsibilities that ensure the quality and security of the data used across a business or organization. Data governance defines who can take what action, upon what data, in what situations, using what methods.
Data Integration Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure. Data integration allows different data types (such as data sets, documents and tables) to be merged by users, organizations and applications, for use as personal or business processes and/or functions.
Data Integrity Data integrity is the overall completeness, accuracy and consistency of data. This can be indicated by the absence of alteration between two instances or between two updates of a data record, meaning data is intact and unchanged. Data integrity is usually imposed during the database design phase through the use of standard procedures and rules. Data integrity can be maintained through the use of various error-checking methods and validation procedures.
Data Intelligence Focuses on internal data used for future endeavors and is sometimes mistakenly labelled as business intelligence. Whereas business intelligence involves organizing, rather than just gathering data to make it useful and applicable to the business’s practices, data intelligence focuses on extrapolating data to assess future services or investments.
Data Journalism This discipline is all about telling interesting and important stories with a data focused approach. It has come about naturally with more information becoming available as data. A story may be about the data or informed by data. There’s a full handbook if you’d like to learn more.
Data Lake A large repository of enterprise-wide data in raw format. Supposedly data lakes make it easy to access enterprise-wide data. However, you really need to know what you are looking for and how to process it and make intelligent use of it.
Data Lineage Data lineage is defined as a type of data life cycle that includes the data’s origins and where it moves over time. This term can also describe what happens to data as it goes through diverse processes.
Data Management The process by which data is acquired, validated, stored, protected, and processed. In turn, its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users. Data management properly oversees the full data lifecycle needs of an enterprise.
Data Mart A data mart is a repository of data that is designed to serve a particular community of knowledge workers. Data marts enable users to retrieve information for single departments or subjects, improving the user response time. Because data marts catalog specific data, they often require less space than enterprise data warehouses, making them easier to search and cheaper to run.
Data Mining Data mining is the process of analyzing hidden patterns of data according to different perspectives for categorization into useful information, which is collected and assembled in common areas, such as data warehouses, for efficient analysis, data mining algorithms, facilitating business decision making and other information requirements to ultimately cut costs and increase revenue.
Data Model and Data Modelling Data modeling is the process of documenting a complex software system design as an easily understood diagram, using text and symbols to represent the way data needs to flow. The diagram can be used to ensure efficient use of data, as a blueprint for the construction of new software or for re-engineering a legacy application.
Data Pipelines A collection of scripts or functions that pass data along in a series. The output of the first method becomes the input of the second. This continues until the data is appropriately cleaned and transformed for whatever task a team is working on.
Data Point A data point is a discrete unit of information. In a general sense, any single fact is a data point. In a statistical or analytical context, a data point is usually derived from a measurement or research and can be represented numerically and/or graphically.
Data Retention In order to comply with privacy regulations, you can set a data retention period in Google Analytics. By default, data that can identify unique individuals, like Client ID, will be removed after 26 months. The data retention period can be set to 14, 26, 38, or 50 months, and you also have the option of keeping the data by selecting ‘do not expire automatically’. Aggregated data will continue to be available in your reports even after the data retention period.
Data Science Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. It is geared toward helping individuals and organizations make better decisions from stored, consumed and managed data.
Data Scientist Someone who can make sense of big data by extracting raw data, massaging it and come up with insights. Skills needed are statistics, computer science, creativity, story-telling and understanding of business context.
Data Source Name (DSN) Data Source Name (DSN) is a data structure than contains the information about a specific database that an Open Database Connectivity (ODBC) driver needs in order to connect to it.
Data Transformation Data transformation is the process of converting data or information from one format to another, usually from the format of a source system into the required format of a new destination system. The usual process involves converting documents, but data conversions sometimes involve the conversion of a program from one computer language to another to enable the program to run on a different platform. The usual reason for this data migration is the adoption of a new system that's totally different from the previous one.
Data Vault Modeling Data vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems.
Data Visualization Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions.
Data Warehouse The data warehouse is a system of storing data for the purpose of analysis and reporting. It is believed to be the main component of business intelligence. Data stored in the warehouse is uploaded from the operational system like sales or marketing.
Data Warehouse A data warehouse is a system used to do quick analysis of business trends using data from many sources. They’re designed to make it easy for people to answer important statistical questions without a Ph.D. in database architecture.
Data Warehouse Developer Data Warehouse Developer Primary role is to develop and deploy code. They receive direction from the ETL architect and directly build ETL functions. Expertise and experience in ETL are needed but that can vary depending on which ETL tool(s) are used.
Data Wrangling (Munging) The process of taking data in its original form and “taming” it until it works better in a broader workflow or project. Taming means making values consistent with a larger data set, replacing or removing values that might affect analysis or performance later, etc. Wrangling and munging are used interchangeably.
Database A database is a collection of information that is organized so that it can be easily accessed, managed and updated. Computer databases typically contain aggregations of data records or files, containing information about sales transactions or interactions with specific customers.
Database Administrator (DBA) A database administrator (DBA) is a specialized computer systems administrator who maintains a successful database environment by directing or performing all related activities to keep the data secure. The top responsibility of a DBA professional is to maintain data integrity. This means the DBA will ensure that data is secure from unauthorized access but is available to users.
Database Management System (DBMS) A database management system (DBMS) is a software package designed to define, manipulate, retrieve and manage data in a database. A DBMS generally manipulates the data itself, the data format, field names, record structure and file structure. It also defines rules to validate and manipulate this data.
Database Marketing Database Marketing the process of building, maintaining, and using customer databases and other databases for the purpose of contacting, transacting, and building customer relationships.
Dataframe A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Data frame APIs usually support more or less elaborate methods for slicing-and-dicing the data, such as ""selecting"" rows, columns, and cells by name or by number, filtering out rows,""recoding"" column and row names,normalizing data .
Dataset A data set is a collection of related, discrete items of related data that may be accessed individually or in combination or managed as a whole entity.
DBScan Density-based spatial clustering of applications with noise is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.
Decile A decile is a quantitative method of splitting up a set of ranked data into 10 equally large subsections. This type of data ranking is performed as part of many academic and statistical studies in the finance and economics fields. The data may be ranked from largest to smallest values, or vice versa.
Decision Boundary In a statistical-classification problem with two classes, a decision boundary or decision surface is a hypersurface that partitions the underlying vector space into two sets, one for each class.
Decision Tree A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
Declining Demand Declining demand consumers begin to buy the product less fre- quently or not at all.
Deep Learning Deep learning a machine learning technique that teaches computers how to learn by rote (i.e. machines mimic learning as a human mind would, by using classification techniques)
Deep Metaphors Deep Metaphors basic frames or orientations that consumers have toward the world around them.
Degree of Freedom The degrees of freedom in a statistical calculation represent how many values involved in a calculation have the freedom to vary. The degrees of freedom can be calculated to help ensure the statistical validity of chi-square tests, t-tests and even the more advanced f-tests. These tests are commonly used to compare observed data with data that would be expected to be obtained according to a specific hypothesis.
Delivery Delivery how well the product or service is delivered to the customer.
Demand chain Planning Demand chain Planning the process of designing the supply chain based on adopting a target market perspective and working backward.
Demand-side Method Demand-side Method identifying the effect sponsorship has on consumers’ brand knowledge.
Demographic Data Demographic data refers to data that is statistically socio-economic in nature such as population, race, income, education and employment, which represent specific geographic locations and are often associated with time. For example, when referring to population demographic data, we have characteristics such as area population, population growth or birthrate, ethnicity, density and distribution. With regard to employment, we have employment and unemployment rates, which can be related further to gender and ethnicity.
Dependent Variable A dependent variable is what you measure and which is affected by independent / input variable(s). It is called dependent because it “depends” on the independent variable. For example, let’s say we want to predict the smoking habits of people. Then the person smokes “yes” or “no” is the dependent variable.
Descriptive Statistics Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach conclusions regarding any hypotheses we might have made. They are simply a way to describe our data.
Design Design the totality of features that affect how a product looks, feels, and functions to a consumer.
Dimension A category that can be used to arrange data by facts and measures for data dicing (grouping) and slicing (filtering) purposes. Commonly used dimensions are people, products, places, and time.
Dimension Table Dimension table is a companion to a fact table. Dimension tables contain descriptive fields that are traditionally textual. Dimension tables are related to fact tables (which contain measures) through the use of keys (such as surrogate keys).
Dimensionality Reduction Dimensionality Reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Dimension Reduction refers to the process of converting a set of data having vast dimensions into data with lesser dimensions ensuring that it conveys similar information concisely.
Direct Marketing Direct marketing the use of consumer-direct (CD) channels to reach and deliver goods and services to customers without using marketing middlemen.
Direct Product Profitability (DDP) Direct Product Profitability (DDP) a way of measuring a product’s handling costs from the time it reaches the warehouse until a cus- tomer buys it in the retail store.
Direct Traffic Anyone who has come to your site by typing your domain name (such as www.datasetsdb.com) in their browser or used a saved bookmark to access your site will be attributed to direct traffic.
Direct-order Marketing Direct-order Marketing marketing in which direct marketers seek a measurable response, typically a customer order.
Discrete Data Data which is not measured on a continuous scale. Examples are binomial (pass/fail), Counts per unit, Ordinal (small/medium/large) and Nominal (red/green/blue). Also known as attribute or categorical data.
Discriminant Analysis Discriminant Analysis is a statistical tool with an objective to assess the adequacy of a classification, given the group memberships; or to assign objects to one group among a number of groups. For any kind of Discriminant Analysis, some group assignments should be known beforehand.
Discrimination Discrimination the process of recognizing differences in sets of similar stimuli and adjusting responses accordingly.
Display ads Display ads small, rectangular boxes containing text and perhaps a picture to support a brand.
Dissociative Groups Dissociative Groups those groups whose values or behavior an individual rejects.
Distributed File System A data storage system meant to store large volumes of data across multiple storage devices and will help decrease the cost and complexity of storing large amounts of data.
Distribution Programming Distribution Programming building a planned, professionally managed, vertical marketing system that meets the needs of both manufacturer and distributors.
Document Management Document management, often, referred to as Document management system is a software which is used to track, store, and manage electronic documents and an electronic image of paper through a scanner. It is one of the basic big data terms you should know to start a big data career.
Dplyr Dplyr is a new package which provides a set of tools for efficiently manipulating datasets in R. dplyr is the next iteration of plyr, focussing on only data frames. dplyr is faster, has a more consistent API and should be easier to use.
Drill The drill is an open source, distributed, low latency SQL query engine for Hadoop. It is built for semi-structured or nested data and can handle fixed schemas. The drill is similar in some aspects to Google’s Dremel and is handled by Apache.
Drillthrough/Drill down In BI terminology, ‘drill down’ refers to moving from high level to detailed, transactional data by focusing in on something (a particular number in a report, for example). In a visualization environment (such as Jet Reports), “drilling down” may involve clicking on some representation in order to reveal more detail.
Drive Drive a strong internal stimulus impelling action.
Dual Adaptation Dual Adaptation adapting both the product and the communications to the local market.
Dummy Variable Dummy Variable is another name for Boolean variable. An example of dummy variable is that it takes value 0 or 1. 0 means value is true (i.e. age < 25) and 1 means value is false (i.e. age >= 25)
Dumping Dumping situation in which a company charges either less than its costs or less than it charges in its home market, in order to enter or win a market.
Durability Durability a measure of a product’s expected operating life under natural or stressful conditions.


Top of the Page
E-business e-business the use of electronic means and platforms to conduct a company’s business.
E-commerce E-commerce a company or site offers to transact or facilitate the selling of products and services online.
Early Stopping In machine learning, early stopping is a form of regularization used to avoid overfitting when training a learner with an iterative method, such as gradient descent. Such methods update the learner so as to make it better fit the training data with each iteration.
Ecommerce Conversion An ecommerce conversion occurs when someone successfully purchases during a session. Google Analytics has a range of ecommerce dimensions and metrics to report on your website’s ecommerce activity.
EDA Exploratory Data Analysis (EDA) is the first step in your data analysis process. Here, you make sense of the data you have and then figure out what questions you want to ask and how to frame them, as well as how best to manipulate your available data sources to get the answers you need.
Elimination-by-aspects Heuristic Elimination-by-aspects Heuristic situation in which the consumer compares brands on an attribute selected probabilistically, and brands are eliminated if they do not meet minimum acceptable cutoff levels.
Embodied AI The idea of embodied AI comes from that of embodied cognition which suggests that intelligence is as much a part of the body as it is a part of the brain. With this in mind, embodied AI (for example, bringing sensory input and robotics into the equation) has a beneficial effect on the cognitive function of the AI, allowing it to better understand its situation and surroundings for a more thorough data analysis and response processing.
Empirical Model An equation derived from the data that expresses a relationship between the inputs and an output (Y=f(x)).
Engagement Rate The engagement rate shows how long a person is on your website. It takes into account time in addition to the number of pages viewed. For example, if only one page is viewed, that visitor receives an engagement rate of 0. This metric can be found in Google Analytics.
Engagement Rate Engagement rate is a term used to measure how “engaged” a visitor is with your brand. This can be calculated in multiple ways. For example, a visitor who came to your website and clicked on multiple pages stayed for an extended period and shared one of your social media posts would have a higher engagement rate than a visitor who only came to your website once.
Entrance The first page that someone views during a session is known as an entrance. You can see the number of times a page was viewed first using the ‘entrance’ metric. This metric is similar to sessions but can vary when multiple hit types are sent to Google Analytics.
Entrances Entrances is the number of times a session in Google Analytics begins. For example, let's say someone went to your homepage and a landing page before leaving your website. There would be one entrance counted on your homepage, and zero entrances counted on the landing page. That's because someone came to your website for the first time when they saw your homepage.
Environmental Threat Environmental Threat a challenge posed by an unfavorable trend or development that would lead to lower sales or profit.
ERP ERP the management of all the information and resources involved in a company’s operations by means of integrated computer system.
Ethnographic Research Ethnographic Research a particular observational research ap- proach that uses concepts and tools from anthropology and other social science disciplines to provide deep cultural understanding of how people live and work.
ETL (Extract, Transform, Load) ETL (Extract, Transform, Load) is three separate functions combined into a single programming tool. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data – using rules or lookup tables, or creating combinations with other data – to convert it to the desired state. Finally, the load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist.
ETL vs ELT Extract, Transform and Load, or ETL, is the traditional way of loading data into a data warehouse, where the data is copied to a staging area, transformed into the correct format and loaded into the warehouse. Extract, Load and Transform, or ELT, is a different methodology where instead of transforming the data before it’s written, it is transformed in place in the target system. This leverages the power of the target data engine or appliance and reduces load times.
Event An event is an action launched by an external hardware device and manipulated by software code. Events allow objects to notify client objects about important activities. Events provide tremendous flexibility compared to traditional console applications, which follow a rigid execution path and are limited by hard wiring. Unlike fields, events are members of an interface.
Everyday Low Pricing (EDLP) Everyday Low Pricing (EDLP) in retailing, a constant low price with few or no price promotions and special sales.
Excel One of the most used spreadsheet applications on the market. There’s no way you haven’t come into contact with Excel. It’s used in data science for obvious reasons, but it’s used in practically every professional environment and, at the very least, a familiarity with it is expected in any job you’ll encounter. Excel does great with crunching numbers; visualizing data; reading, importing, and exporting CSV files commonly used in data science; and much more.
Exchange Exchange the process of obtaining a desired product from someone by offering something in return.
Exclusive Distribution Exclusive Distribution severely limiting the number of intermedi- aries, in order to maintain control over the service level and outputs offered by resellers.
Executive BI Executive BI The practice of collecting, analyzing, and visualizing data to deliver key insights to the executive level resources that help drive business change.
Executive Dashboard Allows the executive team to gain instant insight into the big picture of an entire organization from finance and operations to sales and marketing. Executive Dashboards (such as those Jet Reports delivers through Jet Analytics) are fully customizable to display the data that matters to your executive team.
Expectancy-value Model Expectancy-value Model consumers evaluate products and services by combining their brand beliefs—positive and negative— according to their weighted importance.
Expected Product Expected Product a set of attributes and conditions buyers normally expect when they purchase this product.
Experience Curve (learning curve) Experience Curve (learning curve) a decline in the average cost with accumulated production experience.
Experimental Research Experimental Research the most scientifically valid research de- signed to capture cause-and-effect relationships by eliminating com- peting explanations of the observed findings.
Explainable AI (X.A.I) Also known as transparent AI, explainable AI refers to artificial intelligence that carries out actions which are easily understood by humans and can be trusted.
Exploratory Analysis Exploratory data analysis (EDA) is a term for certain kinds of initial analysis and findings done with data sets, usually early on in an analytical process. Some experts describe it as “taking a peek” at the data to understand more about what it represents and how to apply it. Exploratory data analysis is often a precursor to other kinds of work with statistics and data.
Export An export is a function of international trade whereby goods produced in one country are shipped to another country for future sale or trade. Exports are a crucial component of a country’s economy, as the sale of such goods adds to the producing nation's gross output.
External Data External data is data that is stored outside the current database. External data may be data that you store in another Microsoft Access database, or it might be data that you store in a multitude of other file formats-including ISAM (Indexed Sequential Access Method), spreadsheet, ASCII, and more.


Top of the Page
F-Score The F score, also called the F1 score or F measure, is a measure of a test’s accuracy. The F score is defined as the weighted harmonic mean of the test’s precision and recall.
F-Test An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.
Fact Table Consist of measurements, metrics, or facts of a business process. They are located at the center of a star schema or a snowflake schema surrounded by dimension tables. Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their grain. The grain of a fact table represents the most atomic level by which the facts may be defined. For example, the grain of a sales fact table might be stated as “Sales volume by Day by Product by Store”.
Factor Analysis Factor analysis is a useful tool for investigating variable relationships for complex concepts such as socioeconomic status, dietary patterns, or psychological scales. It allows researchers to investigate concepts that are not easily measured directly by collapsing a large number of variables into a few interpretable underlying factors.
Fad Fad a craze that is unpredictable, short-lived, and without social, economic, and political significance.
False Negative A false negative is a test result that indicates a person does not have a disease or condition when the person actually does have it, according to the National Institute of Health (NIH). False negative test results can occur in many different medical tests, from tests for pregnancy , tuberculosis or Lyme disease to tests for the presence of drugs or alcohol in the body.
False Positive A false positive is where you receive a positive result for a test, when you should have received a negative results. It’s sometimes called a “false alarm” or “false positive error.” It’s usually used in the medical field, but it can also apply to other arenas (like software testing).
Family Brand Family Brand situation in which the parent brand is already associated with multiple products through brand extensions.
Feature Engineering The process of taking knowledge we have as humans and translating it into a quantitative value that a computer can understand. For example, we can translate our visual understanding of the image of a mug into a representation of pixel intensities.
Feature Hashing In machine learning, feature hashing, also known as the hashing trick, is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix.
Feature Reduction Dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Approaches can be divided into feature selection and feature extraction.
Feature Selection Feature Selection is the process where you automatically or manually select those features which contribute most to your prediction variable or output in which you are interested in. Having irrelevant features in your data can decrease the accuracy of the models and make your model learn based on irrelevant features
Features Features things that enhance the basic function of a product.
Few-shot Learning Few-shot learning refers to the training of machine learning algorithms using a very small set of training data instead of a very large set. This is most suitable in the field of computer vision, where it is desirable to have an object categorization model work well without thousands of training examples.
Field Transformation These enable an administrator to transform raw field input into standardized values that are more meaningful to an organization. Transformations are controlled by rules and can be configured for use in queries. For example, if a measurement variable does not fit a normal distribution or has greatly different standard deviations in different groups, field transformation may be necessary—the transformation of data at the field type level. Data transformations are an important tool for proper statistical analysis; to those with a limited knowledge of statistics, however, they may seem a bit fishy, a form of playing around with your data in order to get the answer you want. It is therefore essential that you be able to defend your use of field and data transformations, as well as document them.
Filter A setting that allows you to alter the data that is displayed in your reports. If you have a report with page URLs and only want to see the URLs from your blog, you should type in blog.YOURCOMPANY.com to view only your blog posts. The process of only showing these posts is called filtering.
Financial Reporting The process of producing statements that disclose an organization’s financial status to management, investors, and the government. For Financial Reporting best practices, see our ‘Do This, Not That: Financial Reporting’ blog post.
First and Last Interaction This is an Attribution Report model in HubSpot that gives 50% of the credit for the conversion to the first URL and 50% of the credit for the conversion to the last URL.
First and Last Touch This is an Attribution Report model that gives 50% of the credit for the conversion to the first referring URL or source and 50% of the credit for the conversion to the last referring URL or source.
First Interaction This is an Attribution report model in Google Analytics that gives 100% credit to the first touchpoint before a conversion.
First Interaction (or First-Click) First interaction gives credit for a conversion to the first method that somebody used to find your website. The ‘Model Comparison Tool’ allows you to apply the first interaction (and other attribution models to your conversions). It’s important to know that there is a limit to the amount of historical data included in the attribution reports . There will also be other impacts on first interaction data, for example, people clearing their cookies or using multiple devices.
First Touch This is an Attribution Report model that gives 100% of the credit to the first URL or source visited by a contact on your site.
Fixed Costs (overhead) Fixed Costs (overhead) costs that do not vary with production or sales revenue.
Flexible Market Offering Flexible Market Offering (1) a naked solution containing the product and service elements that all segment members value, and (2) discretionary options that some segment members value.
Flexible Deployment Flexible Deployment refers to a new generation of software that allows complete flexibility on how and where your applications run. They can be run on your own servers, as an appliance (a combination of hardware and software delivered and preconfigured to work straight away) or in the cloud, with all the benefits of running in the cloud.
Flume Flume is defined as a reliable, distributed, and available service for aggregating, collecting, and transferring huge amount of data in HDFS. It is robust in nature. Flume architecture is flexible in nature, based on data streaming.
Focus Group Focus Group a gathering of six to ten people who are carefully se- lected based on certain demographic, psychographic, or other con- siderations and brought together to discuss various topics of interest.
Forecasting Forecasting is determining what is going to happen in the future by analyzing what happened in the past and what is going on now. It is a planning tool that helps business people in their attempts to cope with the uncertainty of what will might and might not occur. Forecasting relies on past and current data and analysis of trends.
Forward Invention Forward Invention creating a new product to meet a need in an- other country.
Frequency Programs (FPs) Frequency Programs (FPs) designed to provide rewards to customers who buy frequently and in substantial amounts.
Frequentist Statistics Frequentist inference is a type of statistical inference that draws conclusions from sample data by emphasizing the frequency or proportion of the data. An alternative name is frequentist statistics.
Friendly Artificial Intelligence (FIA) If the values of an artificial general intelligence are aligned with our own, then it is known as friendly AI. In this hypothetical scenario, a friendly artificial intelligence would have a positive benefit on humanity.
Front End The front end is everything a client or user gets to see and interact with directly. This includes data dashboards, web pages, and forms.
Full Demand Full Demand consumers are adequately buying all products put into the marketplace.
Full Load A means of reading and updating all records in a data source during warehouse loading. When it comes to loading data into a data warehouse, there are two main techniques: full load and incremental load.
Funnel Funnel is defined as the steps a user takes from the moment they interact with your brand to the time they become a customer. By monitoring funnel analytics, marketers can find unique opportunities to improve the conversion process. For example, an online retailer could see that most customers are dropping off during the checkout process in their funnel and could optimize this process based on the data.
Fuzzy Algorithms Algorithms that use fuzzy logic to decrease the runtime of a script. Fuzzy algorithms tend to be less precise than those that use Boolean logic. They also tend to be faster, and computational speed sometimes outweighs the loss in precision.
Fuzzy Logic Fuzzy logic is a logic operations method based on many-valued logic rather than binary logic (two-valued logic). Two-valued logic often considers 0 to be false and 1 to be true. However, fuzzy logic deals with truth values between 0 and 1, and these values are considered as intensity (degrees) of truth.


Top of the Page
Gamification Gamification refers to the use of game design principles to improve customer engagement in non-game businesses. The specific methods used range from the creation of reward schedules to creating levels of achievement via status and badges. Companies use gaming principles to increase interest in a product or service, or simply to deepen their customers' relationship with the brand.
Gap Analysis Gap analysis a study of whether the data that a company has can meet the business expectations that the company has set for its reporting and BI, and where possible data gaps or missing data might exist.
Gated Recurrent Unit (GRU) A gated recurrent unit (GRU) is part of a specific model of recurrent neural network that intends to use connections through a sequence of nodes to perform machine learning tasks associated with memory and clustering, for instance, in speech recognition. Gated recurrent units help to adjust neural network input weights to solve the vanishing gradient problem that is a common issue with recurrent neural networks.
Gauges Gauges a device for measuring the magnitude, amount, or contents of something, typically with a visual display of such information.
General Intelligence The capability to achieve pretty much any goal, including the ability to learn. Should be distinguished from artificial general intelligence which is the ability of AI to accomplish any cognitive task to the same level as humans.
Generative adversarial networks (GAN) Generative adversarial networks (GAN) a type of neural network that can generate seemingly authentic photographs on a superficial scale to human eyes. GAN-generated images take elements of photographic data and shape them into realistic-looking images of people, animals, and places.
Generics Generics unbranded, plainly packaged, less expensive versions of common products such as spaghetti, paper towels, and canned peaches.
Genetic algorithm Genetic algorithm an algorithm based on principles of genetics that is used to efficiently and quickly find solutions to difficult problems.
Geospatial Analytics Geospatial Analytics are used to analyze data about physical objects that are linked to a geographical location. This usually means data on imagery, GPS, satellite photography and historical data, that can be used to describe exact geographic coordinates or build a picture of an area in terms of a street address, postal code or any other identifiers, as they are applied to geographic models.
Ggplot2 ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005, ggplot2 is an implementation of Leland Wilkinson's Grammar of Graphics—a general scheme for data visualization which breaks up graphs into semantic components such as scales and layers.
Global Firm Global Firm a firm that operates in more than one country and cap- tures R&D, production, logistical, marketing, and financial advan- tages in its costs and reputation that are not available to purely domestic competitors.
Global Industry Global Industry an industry in which the strategic positions of com- petition in major geographic or national markets are fundamentally affected by their overall global positions.
Global Site Tag (or gtag.js) The global site tag (or gtag.js) is the current version of the stand-alone Google Analytics tracking code. Generally, you will want to use Google Tag Manager to implement Google Analytics on your website. However, you do have the option of using the Google Analytics tracking code instead.
Go Go is an open source programming language that makes it easy to build simple, reliable, and efficient software. Go is a statically typed language in the tradition of C.
Goal Goals are used to track desired actions on your website. For example, subscribing to your email newsletter, submitting an inquiry or registering as a member. Goals can be configured inside Google Analytics and can be based on people traveling to a particular page (or pages), triggering an event, sessions of a certain duration or viewing a certain number of pages.
Goal Abandonment Destination (or page-based) goals can be configured to include additional pages leading to a conversion (funnel steps). If somebody views at least one of the funnel steps without converting, they will be considered as abandoning the goal and be included in the goal abandonment metric.
Goal Completion When a user converts for a particular goal during a session they’ll be counted as a goal completion. If a goal is completed multiple times during a user’s session, it will only be counted as a single conversion.
Goal Completion Location This dimension reports the particular page where a conversion occurred for a destination (or page-based) goal. This is especially useful if you’re including multiple conversion pages for a goal. The goal completion location will also show you the page that was viewed when an event-based or engagement-based (duration and pages per session) goal was triggered.
Goal Formulation Goal Formulation the process of developing specific goals for the planning period.
Goal Value An optional dollar value can be set for each goal inside Google Analytics. The goal value can be used to report on an actual dollar value, a calculated value or a symbolic value for each conversion. The event-based goal allows you to pull the event’s ‘value’, the other goal types use a fixed (or static) value for each conversion.
Goodness of Fit The goodness of fit test is used to test if sample data fits a distribution from a certain population (i.e. a population with a normal distribution or one with a Weibull distribution). In other words, it tells you if your sample data represents the data you would expect to find in the actual population.
Google Data Studio Google's reporting and dashboarding tool allows you to present and visualize data from Google Analytics, Google Sheets and other data sources.
Google Optimize Google's platform for A/B testing, multivariate testing and personalization. Google Optimize allows you to present different variations of content on your website to increase conversions and improve conversion rate.
Google Signals You can begin collecting data into the automated Cross Device reports by enabling Google signals in Google Analytics. Google signals uses aggregated and anonymized data from people logged into their Google account to understand how people engage with your website using multiple devices.
Google Tag Manager A system for managing the deployment of tracking and other tags on your website. Google Tag Manager allows tags to be tested on your website before being deployed live and is designed to reduce the dependence on IT for managing tracking tags.
Governed Data Information and data processes that are managed, controlled and secured by a governing department, typically IT, to fit business rules and standards before users may access it. This helps to ensure data integrity; in effect, users will only be working with trusted, credible data.
Gradient Descent Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.
Graph Database A graph database is a type of NoSQL or non-relational database, which is a type of database suitable for very large sets of distributed data. Instead of using tables like those found in relational databases, a graph database, as the name suggests, uses graph structures with nodes, properties and edges in order to represent and store data.
Gray Market Gray Market branded products diverted from normal or authorized distribution channels in the country of product origin or across inter- national borders.
Greedy Algorithms A greedy algorithm will break a problem down into a series of steps. It will then look for the best possible solution at each step, aiming to find the best overall solution available. A good example is Dijkstra’s algorithm, which looks for the shortest possible path in a graph.
Grid Computing Grid computing is a collection of computer resources for performing computing functions using resources from various domains or multiple distributed systems to reach a specific goal. A grid is designed to solve big problems to maintain the process flexibility. Grid computing is often used in scientific/marketing research, structural analysis, web services such as back-office infrastructures or ATM banking etc.


Top of the Page
Hadoop Hadoop is an open source distributed processing framework that manages data processing and storage for big data applications running in clustered systems. It is at the center of a growing ecosystem of big data technologies that are primarily used to support advanced analytics initiatives, including predictive analytics, data mining and machine learning applications.
Hadoop Distributed File System (HDFS) Hadoop Distributed File System (HDFS)Hadoop Distributed File System (HDFS) is primary data storage layer used by Hadoop applications. It employs DataNode and NameNode architecture to implement distributed and Java-based file system which supplies high-performance access to data with high scalable Hadoop Clusters. It is designed to be highly fault-tolerant.
Hadoop User Experience Hadoop User Experience (HUE) is an open source interface which makes Apache Hadoop’s use easier. It is a web-based application. It has a job designer for MapReduce, a file browser for HDFS, an Oozie application for making workflows and coordinators, an Impala, a shell, a Hive UI, and a group of Hadoop APIs.
HAMA Hama is basically a distributed computing framework for big data analytics based on Bulk Synchronous Parallel strategies for advanced and complex computations like graphs, network algorithms, and matrices. It is a Top-level Project of The Apache Software Foundation.
HBase HBase is a column-oriented non-relational database management system that runs on top of Hadoop Distributed File System (HDFS). HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases. It is well suited for real-time data processing or random read/write access to large volumes of data.
Hedonic Bias Hedonic Bias when people have a general tendency to attribute success to themselves and failure to external causes.
Heuristic Heuristic a computer science technique designed for quick, optimal, solution-based problem solving
Hidden Markov Model Hidden Markov Models or HMMs are the most common models used for dealing with temporal Data. They also frequently come up in different ways in a Data Science Interview usually without the word HMM written over it. In such a scenario it is necessary to discern the problem as an HMM problem by knowing characteristics of HMMs.
Hierarchical Clustering Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.
Hierarchy Refers to a means of organizing different levels of a dimension by granularity; usually from largest to smallest. For a date, a typical hierarchy would be organized by year, quarter, month and day. Country, state, city and customer is another example of hierarchy levels within a cube.
High Performance Computing High-performance computing (HPC) is the use of super computers and parallel processing techniques for solving complex computational problems. HPC technology focuses on developing parallel processing algorithms and systems by incorporating both administration and parallel computational techniques.
High-low Pricing High-low Pricing charging higher prices on an everyday basis but then running frequent promotions and special sales.
High-Performance Analytical Application (HANA) High-performance Analytical Application is a software/hardware scheme for large volume transactions and real-time data analytics in-memory computing platform from the SAP.
Histograms A histogram is a display of statistical information that uses rectangles to show the frequency of data items in successive numerical intervals of equal size. In the most common form of histogram, the independent variable is plotted along the horizontal axis and the dependent variable is plotted along the vertical axis. The data appears as colored or shaded rectangles of variable area.
Hit A hit is a name for user interactions. Example of hits include pageviews, transactions, items, events, social interactions, or user timing. This term is used in Google Analytics.
Hive Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Holdout Sample Holdout sample is a sample of data not used in fitting a model, used to assess the performance of that model; this book uses the terms validation set or, if one is used in the problem, test set instead of holdout sample.
Holistic Marketing Concept Holistic Marketing Concept a concept based on the develop- ment, design, and implementation of marketing programs, processes, and activities that recognizes their breath and interdependencies.
Holt-Winters Forecasting Holt-Winters is one of the most popular forecasting techniques for time series. The model predicts the future values computing the combined effects of both trend and seasonality. The idea behind Holt’s Winter forecasting is to apply exponential smoothing to the seasonal components in addition to level and trend.
Horizontal Marketing System Horizontal Marketing System two or more unrelated companies put together resources or programs to exploit an emerging market opportunity.
Hostname The part of your website’s URL that identifies where the Google Analytics tracking code was loaded. For example, if someone viewed https://www.example.com/contact then Google Analytics would report on www.example.com as the hostname. Viewing the hostnames in Google Analytics can be especially useful if you’ve installed the tracking code on multiple domains (or subdomains).
Hub-and-Spoke System Hub-and-Spoke System product-management organization where brand or product manager is figuratively at the center, with spokes leading to various departments representing working relationships.
Human Resources Dashboard (HR dashboard) Analyzes and presents enterprise data for the purpose of displaying meaningful HR KPIs. Employee Satisfaction Index, number of FTEs, turnover rate, employee benefit usage, and cost per employee are examples of data points that can be visualized and displayed via an HR dashboard.
Human-Level AI Another term for artificial intelligence, human-level AI is the point at which a non-biological intelligence is able to complete any cognitive task that humans are capable of.
Hybrid Channels Hybrid Channels use of multiple channels of distribution to reach customers in a defined market.
Hyperparameter A hyperparameter is a parameter that is set before the learning process begins. These parameters are tunable and can directly affect how well a model trains.
Hyperplane A hyperplane is a concept in geometry. It is a generalization of the plane into a different number of dimensions. A hyperplane of an n-dimensional space is a flat subset with dimension. By its nature, it separates the space into two half spaces.
Hypothesis A hypothesis in a scientific context, is a testable statement about the relationship between two or more variables or a proposed explanation for some observed phenomenon. In a scientific experiment or study, the hypothesis is a brief summation of the researcher's prediction of the study's findings, which may be supported or not by the outcome. Hypothesis testing is the core of the scientific method.


Top of the Page
Image recognition Image recognition the process of identifying or detecting an object or feature of an object in an image or video.
Impala Impala is a type of software tool that is known as a query engine. It is licensed by Apache and runs on the open-source Apache Hadoop big data analytics platform.
Impressions Often seen in Facebook and Twitter advertising, the number of impressions equals the number of times a piece of content has been viewed. For example, a post on Twitter could receive 200 impressions if it has been viewed by 200 Twitter users. Remember, not everyone who receives a tweet in their stream will see it. So, you should also consider the Exposure, which reflects the potential reach that your content has, often called potential impressions. For Twitter, this would be the number of times your Tweet could appear in users’ Twitter feeds.
Imputation Imputation, in statistics, is the insertion of a value to stand in for missing data. Analytics programs and methods don't function properly with missing data. Statistical packages, for example, commonly delete any case with data missing.
In-Memory In-memory refers to using a computer’s random access memory (RAM) as opposed to its hard disk drives or flash memory storage. RAM is magnitudes faster than a hard disk drive and therefore software that can run in-memory without having wait to load data from disk can run many times faster that software relying on disk.
In-Memory Analytics Refers to the process of querying data when it resides in computer memory (such as RAM) rather than on physical storage device, such as hard drives. In-memory querying is incrementally faster than the alternatives, resulting in expedited business decisions driven by data from business intelligence applications. As the cost of RAM continues to decline, large-scale in-memory analytics is becoming a more feasible option for many organizations.
In-memory Computing In-memory computing is the storage of information in the main random access memory (RAM) of dedicated servers rather than in complicated relational databases operating on comparatively slow disk drives. ... This has made in-memory computing economical among a wide variety of applications.
In-Memory Database An in-memory database is a database that keeps the whole dataset in RAM. This means that whenever you query a database or update data in a database, you only access the main memory. So, it’s a lot faster as there’s no disk to slow it down.
Incremental Load Contrasted with full load, incremental load is a means of data warehouse loading that involves only loading new or updated records. Incremental loads are useful because they run very efficiently when compared to full loads and allow for more frequent updating of the data warehouse and cubes, particularly with large data sets.
Index A data structure that stores the values for a specific column in a table. Indexing is a way of sorting a number of records on multiple fields. For example, creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing rapid Binary Searches to be performed on it.
Industry Industry a group of firms that offer a product or class of products that are close substitutes for one another.
Inferential Statistics Inferential statistics is a statistical method that deduces from a small but representative sample the characteristics of a bigger population. In other words, it allows the researcher to make assumptions about a wider group, using a smaller portion of that group as a guideline.
Informational Appeal Informational Appeal elaborates on product or service attributes or benefits.
Ingredient Branding Ingredient Branding a special case of co-branding that involves creating brand equity for materials, components, or parts that are necessarily contained within other branded products.
Inmom Approach A top-down methodology for data warehousing. It states that the data warehouse should be modeled using normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.). The normalized structure divides data into entities, which creates several tables in a relational database. When applied in large enterprises, the result is dozens of tables that are linked together by a web of joins.
Innovation Innovation any good, service, or idea that is perceived by someone as new.
Innovation Diffusion Process Innovation Diffusion Process the spread of a new idea from its source of invention or creation to its ultimate users or adopters.
Installation Installation the work done to make a product operational in its planned location.
Institutional Market Institutional Market schools, hospitals, nursing homes, prisons, and other institutions that must provide goods and services to people in their care.
Integrated Logistics Systems (ILS) Integrated Logistics Systems (ILS) materials management, material flow systems, and physical distribution, abetted by information technology (IT).
Integrated Marketing Integrated Marketing mixing and matching marketing activities to maximize their individual and collective efforts.
Integrated Marketing channel System Integrated Marketing channel System the strategies and tactics of selling through one channel reflect the strategies and tactics of selling through one or more other channels.
Integrated Marketing Communications (IMC) Integrated Marketing Communications (IMC) a concept of marketing communications planning that recognizes the added value of a comprehensive plan.
Intelligence Explosion Intelligence explosion describes a scenario where a level of rapidly occurring improvements to general intelligence are on a clear path to AI, reaching superintelligence. It will almost certainly be clear when an intelligence explosion is underway and at this point, the intelligence of AI will rapidly begin to surpass and accelerate beyond even the smartest humans. The recursive nature of an intelligence explosion highlights why it’s so important for us to build safety protocols and safeguards into the development of AI before we reach this point of no-return.
Intensive Distribution Intensive Distribution the manufacturer placing the goods or services in as many outlets as possible.
Interaction Score The interaction score is in HubSpot and tells the Attribution Report what data to look at as well as what Attribution model to use in the analysis. For example, the Attribution Report can be pulled by URL, referrer, or source. And then the different model can be chosen.
Interactive Visualization Interactive visualization technology enables the exploration of data via the manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the dataset being analysed. These products provide an array of visualization options that go beyond those of pie, bar and line charts, including heat and tree maps, geographic maps, scatter plats and other special-purpose visuals.
Interests You can view your audience’s areas of interest by enabling ‘Advertising Features’ (navigate to ‘Admin’, then ‘Tracking Info’ and selecting ‘Data Collection’). The categories within the Interests reports align to the Interest targeting options available in Google Ads.
Internal Branding Internal Branding activities and processes that help to inform and inspire employees.
Internal Marketing Internal Marketing an element of holistic marketing, is the task of hiring, training, and motivating able employees who want to serve customers well.
Interstitials Interstitials advertisements, often with video or animation, that pop up between changes on a Web site.
IoT (Internet of Things) The internet of things (IoT) is a computing concept that describes the idea of everyday physical objects being connected to the internet and being able to identify themselves to other devices. The term is closely identified with RFID as the method of communication, although it also may include other sensor technologies, wireless technologies or QR codes.
IQR InterQuartile Range (IQR) is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s preferred over many other measures of spread (i.e. the average or median) when reporting things like school performance or SAT scores.
Irregular Demand Irregular Demand consumer purchases vary on a seasonal, monthly, weekly, daily, or even hourly basis.
Iteration Iteration refers to the number of times an algorithm’s parameters are updated while training a model on a dataset. For example, each iteration of training a neural network takes certain number of training data and updates the weights by using gradient descent or some other weight update rule.


Top of the Page
Java Java is a general purpose, high-level programming language developed by Sun Microsystems (now owned by Oracle). It was designed from the ground up to be a pure object-oriented language with a syntax similar to C++ but with some of the difficult aspects from C, such as memory allocation taken care of. It was also designed to be platform independent, compiling ‘byte code’ instructions for the Java Virtual Machine (JVM) that runs anywhere.
Jet Analytics (Previously Jet Enterprise) A corporate analytics and reporting platform that delivers fast and flexible dashboards and financial reports in Excel on the web. Offers pre-built cubes, data warehouse, dashboards, and more. Learn more about Jet Analytics.
Jet Basics (Previously Jet Express) A free extension included with Microsoft Dynamics NAV and GP to create basic reports inside of Excel (Jet Basics) or Word (Jet Express for Word). Created in collaboration with Microsoft. Learn more about Jet Basics or Word.
Jet Data Manager (JDM) The Data Warehouse Automation (DWA) platform for Jet Analytics. With the JDM you can transform and validate data from different data sources, consolidate them in a data warehouse, and build Online Analytical Processing (OLAP) cubes using a drag-and-drop interface. When you execute or deploy a BI project, the Jet Data Manager also automatically generates the underlying SQL code of your solution. With the Jet Data Manager you can access data from a variety of sources including ERP systems (such as Microsoft Dynamics), CRM applications, SQL databases, spreadsheets, and plain text files. The data is stored in a data warehouse in Microsoft SQL Server and can be viewed in your preferred front-end application such as Excel, Microsoft’s Power BI, the Jet Analytics mobile dashboard builder, or other third party applications.
Jet Reports (Previously Jet Professional) A fast, flexible financial and business reporting solution inside of Excel. Allows users to drill down on the numbers directly from Excel and access reports anywhere on the web. Learn more about Jet Reports.
Jobbers Jobbers small-scale wholesalers who sell to small retailers.
Joint Application Development (JAD) Joint Application Development (JAD) is a methodology that involved the client or end user in the design and development of an application, through a succession of collaborative workshops called JAD sessions.
Joint Venture Joint Venture a company in which multiple investors share owner- ship and control.
Julia Julia is a high-level programming language designed for high-performance numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and types in a fully dynamic programming language and multiple dispatch as its core programming paradigm.
Juridical Data Compliance Use of data stored in a country must follow the laws of that country. Relevant when using cloud solutions with data stored in difference countries or continents.


Top of the Page
K-Means K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e., data without defined categories or groups). The goal of this algorithm is to find groups in the data, with the number of groups represented by the variable K. The algorithm works iteratively to assign each data point to one of K groups based on the features that are provided.
Keras Keras is a simple, high-level neural network library, written in Python. It is capable of running on top of Tensorflow and Theano. This is done to make design and experiments with Neural Networks easier.
Key Value Stores / Key Value Databases Key value store or key-value database is a paradigm of data storage which is schemed for storing, managing, and retrieving a data structure. Records are stored in a data type of a programming language with a key attribute which identifies the record uniquely. That’s why there is no requirement of a fixed data model.
Key Value Databases A key-value database (also known as a key-value store and key-value store database) is a type of NoSQL database that uses a simple key/value method to store data. The key-value part refers to the fact that the database stores data as a collection of key/value pairs. This is a simple method of storing data, and it is known to scale well.
Key-Value Store A Key-Value Store is a data storage model designed for storing, retrieving, and managing associative arrays – otherwise known as a dictionaries or hash tables. Being far simpler than a relational database (RDBMS) it can be extremely fast and scales well. It also has the advantage of being schema-less but compared to an RDBMS the key-value stores are lacking in functionality and queries such as joins are either very slow or not possible.
Keyword Google Analytics provide details about the keywords people use to find your website. The organic keywords report shows you the terms people used to find your website when clicking on a free result from a search engine. A lot of organic keyword traffic is shown as ‘not provided’ which means that the individual keyword was hidden by the search engine (see also not provided). The paid keywords report shows you keywords from linked Google AdWords accounts and campaign tagged URLs using the ‘term’ parameter.
Key Performance Indicator (KPI) Key Performance Indicator (KPI) a quantifiable measure used to evaluate the success of an organization, employee, etc., in meeting objectives for performance.
Kimball Approach A bottom-up methodology for data warehousing (contrasted with the Inmon approach, which is top-down). Using the Kimball approach, dimensional data marts are first created to provide reporting and analytical capabilities for specific business areas such as “Sales” or “Production”, then combined into a broader data warehouse. It is the most frequently used methodology, especially if you are using the Microsoft BI stack.
kNN K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases based on a similarity measure (e.g., distance functions). KNN has been used in statistical estimation and pattern recognition already in the beginning of 1970’s as a non-parametric technique.
Kurtosis Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.


Top of the Page
Labeled Data Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of that unlabeled data with meaningful tags that are informative.
Labels Labels allow marketers to save valuable customer segments (will discuss segments further down). By creating labels, marketers can quickly reference specific segments and set criteria for additional customers to be dynamically added as they meet the criteria. For example, a company could create a label called “At-Risk of Leaving” and include clients who have submitted a particular number or support tickets or who are coming to the end of their subscription.
Landing Page The landing page is the first page viewed during a session, or in other words, the entrance page. It can be useful to review your landing pages to understand the most popular pages people view as they navigate to your website. This can be used to identify potential opportunities to cross-promote or feature other content from your website.
Lasso Regression Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of muticollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.
Last AdWords Click This is an Attribution Report model in Google Analytics that gives 100% of the credit for the conversion to the last AdWords click.
Last Interaction Last Interaction Attribution is also referred to as "last-click" or "last-touch." As the name implies, this model gives 100% of the credit to the last interaction your business had with a lead before they convert. For example, a visitor finds your website through organic search.
Last Interaction (or Last-Click) When a user converts on your website, the last method they used to find your website is reported as the last interaction leading to the conversion. The ‘Model Comparison Tool’ allows you to attribute conversions to the last interaction to understand the channels that are better as closing (or completing) conversions.
Last Non-Direct Click The Last Non-Direct Click model ignores direct traffic and attributes 100% of the conversion value to the last channel that the customer clicked through from before buying or converting. Analytics uses this model by default when attributing conversion value in non-Multi-Channel Funnels reports.
Latency Latency is a networking term to describe the total time it takes a data packet to travel from one node to another. In other contexts, when a data packet is transmitted and returned back to its source, the total time for the round trip is known as latency. Latency refers to time interval or delay when a system component is waiting for another system component to do something. This duration of time is called latency.
Latent Demand Latent Demand consumers may share a strong need that cannot be satisfied by an existing product.
Lead and Lag Analytical functions that are used to calculate the difference between rows in a table. LEAD calculates the difference between the current row and the following row, while LAG calculates the difference between the current row and the previous row.
Lead generation Lead generation the use of a computer program, a database, the Internet, or a specialized service to obtain or receive information for the purpose of expanding the scope of business, increasing sales revenues, looking for a job or for new clients or conducting specialized research.
Lead Score Lead Scoring assigns a number (score) to a lead based on the perceived fit for your company and the lead behavior. For example, a visitor who downloads a whitepaper reads a blog post, attends an event and requests a demo will have a much higher lead score than a person who has only downloaded the whitepaper.
Lean Manufacturing Lean Manufacturing producing goods with minimal waste of time, materials, and money.
Learning Learning changes in an individual’s behavior arising from experience.
Level A grouping within a dimension. For example, customers can be grouped by city or country. When grouped in this way, customer, city and country are categorized as different levels within a cube. Similarly, dates can have different levels in a BI cube (day, month, quarter, year, is a common example.)
Lexicographic Heuristic Lexicographic Heuristic a consumer choosing the best brand on the basis of its perceived most important attribute.
Licensed Product Licensed Product one whose brand name has been licensed to other manufacturers who actually make the product.
Life Stage Life Stage a person’s major concern, such as going through a di- vorce, going into a second marriage, taking care of an older parent, deciding to cohabit with another person, deciding to buy a new home, and so on.
Life-cycle Cost Life-cycle Cost the product’s purchase cost plus the discounted cost of maintenance and repair less the discounted salvage value.
Lifecycle Stage A property that shows where contacts are in your marketing funnel. In HubSpot these lifecycle stages include Subscriber, Lead, Marketing Qualified Lead, Sales Qualified Lead, Opportunity, Customer, Evangelist, and Other.
Lifestyle Lifestyle a person’s pattern of living in the world as expressed in activities, interests, and opinions.
Lifetime Value (LTV) The lifetime value metrics, including lifetime revenue per user and lifetime revenue, show you the total value based on users, instead of sessions.
Limited memory In this period of technology, artificial intelligence is overcoming over every one of the enterprises and areas, performing tasks more effectively than humans.
Line Chart Line charts are used to display information as series of points connected by straight line segment. These charts are used to communicate information visually, such as to show an increase or decrease in the trend in data over intervals of time.
Line Extension Line Extension the parent brand is used to brand a new product that targets a new market segment within a product category currently served by the parent brand.
Line Stretching Line Stretching a company lengthens its product line beyond its current range.
Linear This is an Attribution report model in Google Analytics that gives each touchpoint in the conversion path equal credit for the conversion. In HubSpot, this is called all interactions.
Linear Regression Linear regression is a kind of statistical analysis that attempts to show a relationship between two variables. Linear regression looks at various data points and plots a trend line. Linear regression can create a predictive model on apparently random data, showing trends in data, such as in cancer diagnoses or in stock prices.
Linked Data Linked data refers to the collection of interconnected datasets that can be shared or published on the web and collaborated with machines and users. It is highly structured, unlike big data. It is used in building Semantic Web in which a large amount of data is available in the standard format on the web.
List Segmentation Market segmentation is the activity of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers based on some type of shared characteristics.
Load balancing Load balancing is a tool which distributes the amount of workload between two or more computers over a computer network so that work gets completed in small time as all users desire to be served faster. It is the main reason for computer server clustering and it can be applied with software or hardware or with the combination of both.
Location Analytics Location analytics is the process or the ability to gain insight from the location or geographic component of business data. Data, especially transactional data generated by businesses, often contains a geographical component that, when laid out in a geographical information system, allows for new dimensions of analysis and insights, in this case through a more visual approach.
Location Data Location data are information that a mobile device, like a smartphone or tablet, provides about its current position in space. It can be used in your projects. You can simulate location-based applications using the “Change location” option to activate interactions.
Log File A log file is a file that keeps a registry of events, processes, messages and communication between various communicating software applications and the operating system. Log files are present in executable software, operating systems and programs whereby all the messages and process details are recorded. Every executable file produces a log file where all activities are noted.
Log Loss Log Loss or Logistic loss is one of the evaluation metrics used to find how good the model is. Lower the log loss, better is the model. Log loss is the logarithm of the product of all probabilities.
Logical Data Warehouse A Logical Data Warehouse (LDW) is an architectural layer that sits on top of the usual data warehouse stores (silos) of persisted data and provides several mechanisms for viewing data without relocating and transforming data ahead of view time.
Logistic Regression Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.
Long Short Term Memory (LSTM) Long short-term memory (LSTM) is an artificial recurrent neural network architecture used in the field of deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. It can not only process single data points, but also entire sequences of data.
Long-term Memory (LTM) Long-term Memory (LTM) a permanent repository of information.
Lookback Window The lookback window allows you to control the amount of historical data that is included when using the attribution reports. For example, setting a lookback window of 14 days will include touchpoints up to 14 days before the conversion occurred. Any touchpoint outside of the lookback window won't be included in the report. The default lookback window is 30 day, but it can be set between one and 90 days.


Top of the Page
Machine learning (ML) A process where a computer uses an algorithm to gain understanding about a set of data, then makes predictions based on its understanding. There are many types of machine learning techniques; most are classified as either supervised or unsupervised techniques. Machine learning is a method of designing systems that can learn, adjust, and improve based on the data fed to them. Using predictive and statistical algorithms that are fed to these machines, they learn and continually zero in on “correct” behavior and insights and they keep improving as more data flows through the system.
Machine Learning Engineer A data scientist does the statistical analysis required to determine which machine learning approach to use, then they model the algorithm and prototype it for testing. At that point, a machine learning engineer takes the prototyped model and makes it work in a production environment at scale.
Machine Learning Techniques The field of machine learning has grown so large that there are now positions for Machine Learning Engineers. The terms below offer a broad overview of some common techniques used in machine learning.
Machine Translation Machine translation an application of NLP used for language translation (human-to-human) in text- and speech-based conversations
Machine-Generated Data Machine-generated data is information that is the explicit result of a computer process or application process, created without human intervention. This means that data manually entered by an end user are definitely not considered to be machine generated. These data cross all sectors which make use of computers in any of their daily operations, and humans increasingly generate this data unknowingly, or at least cause it to be generated by the machine.
Mahout Apache Mahout is a project of the Apache Software Foundation which is implemented on top of Apache Hadoop and uses the MapReduce paradigm.
Management Information System (MIS) Management Information System (MIS) a computerized information-processing system designed to support the activities of company or organizational management.
Many-to-Many Relationships Refers to a relationship between tables in a database when a parent row in one table contains several child rows in the second table, and vice versa. Many-to-many relationships are often tricky to represent. However, one or more rows in a table can be related to 0, 1, or many rows in another table. In a many-to-many relationship between Table A and Table B, each row in Table A is linked to 0, 1, or many rows in Table B and vice versa. A 3rd table called a mapping table is required in order to implement such a relationship.
MapReduce MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the Hadoop File System (HDFS). It is a core component, integral to the functioning of the Hadoop framework.
Market Basket Analysis Market basket analysis is a data mining technique used by retailers to increase sales by better understanding customer purchasing patterns. It involves analyzing large data sets, such as purchase history, to reveal product groupings, as well as products that are likely to be purchased together.
Market Demand Market Demand the total volume of a product that would be bought by a defined customer group in a defined geographical area in a de- fined time period in a defined marketing environment under a de- fined marketing program.
Market Forecast Market Forecast the market demand corresponding to the level of industry marketing expenditure.
Market Logistics Market Logistics planning the infrastructure to meet demand, then implementing and controlling the physical flows or materials and final goods from points of origin to points of use, to meet customer re- quirements at a profit.
Market Mix Modeling Marketing mix modeling is statistical analysis such as multivariate regressions on sales and marketing time series data to estimate the impact of various marketing tactics on sales and then forecast the impact of future sets of tactics.
Market Opportunity Analysis (MOA) Market Opportunity Analysis (MOA) system used to determine the attractiveness and probability of success.
Market Partitioning Market Partitioning the process of investigating the hierarchy of attributes consumers examine in choosing a brand if they use phased decision strategies.
Market Penetration Index Market Penetration Index a comparison of the current level of market demand to the potential demand level.
Market Potential Market Potential the limit approached by market demand as industry marketing expenditures approach infinity for a given market- ing environment.
Market Share Market Share a higher level of selective demand for a product.
Market-buildup Method Market-buildup Method identifying all the potential buyers in each market and estimating their potential purchases.
Market-centered Organizations Market-centered Organizations companies that are organized along market lines.
Market-management Organization Market-management Organization a market manager supervising several market-development managers, market specialists, or indus- try specialists and draw on functional services as needed.
Market-penetration Pricing Market-penetration Pricing pricing strategy where prices start low to drive higher sales volume from price-sensitive customers and pro- duce productivity gains.
Market-skimming Pricing Market-skimming Pricing pricing strategy where prices start high and are slowly lowered over time to maximize profits from less price- sensitive customers.
Marketer Marketer someone who seeks a response (attention, a purchase, a vote, a donation) from another party, called the prospect.
Marketing Marketing the activity, set of institutions, and processes for creating, communicating, delivering, and exchanging offerings that have value for customers, clients, partners, and society at large.
Marketing Audit Marketing Audit a comprehensive, systematic, independent, and periodic examination of a company’s or business unit’s marketing environment, objectives, strategies, and activities.
Marketing Channel System Marketing Channel System the particular set of marketing channels employed by a firm.
Marketing Channels Marketing Channels sets of interdependent organizations involved in the process of making a product or service available for use or consumption.
Marketing Communications Marketing Communications the means by which firms attempt to inform, persuade, and remind consumers—directly or indirectly— about products and brands that they sell.
Marketing Communications Mix Marketing Communications Mix advertising, sales promotion, events and experiences, public relations and publicity, direct marketing, and personal selling.
Marketing Concept Marketing Concept is to find not the right customers for your products, but the right products for your customers
Marketing Dashboard Analyzes and presents enterprise data for the purpose of displaying meaningful marketing KPIs. Lead acquisition, website conversion rate, cost per lead, web traffic, and more are examples of data points that can be visualized and displayed via a marketing dashboard.
Marketing Decision Support System (MDSS) Marketing Decision Support System (MDSS) a coordinated col- lection of data, systems, tools, and techniques with supporting soft- ware and hardware by which an organization gathers and interprets relevant information from business and the environment and turns it into a basis for marketing action.
Marketing Funnel Marketing Funnel identifies the percentage of the potential target market at each stage in the decision process, from merely aware to highly loyal.
Marketing Implementation Marketing Implementation the process that turns marketing plans into action assignments and ensures that such assignments are executed in a manner that accomplishes the plan’s stated objectives.
Marketing Information System (MIS) Marketing Information System (MIS) people, equipment, and procedures to gather, sort, analyze, evaluate, and distribute information to marketing decision makers.
Marketing Insights Marketing Insights diagnostic information about how and why we observe certain effects in the marketplace, and what that means to marketers.
Marketing Intelligence System Marketing Intelligence System a set of procedures and sources managers use to obtain everyday information about developments in the marketing environment.
Marketing Management Marketing Management the art and science of choosing target markets and getting, keeping, and growing customers through creating, delivering, and communicating superior customer value.
Marketing Metrics Marketing Metrics the set of measures that helps firms to quantify, compare, and interpret their marketing performance.
Marketing Network Marketing Network the company and its supporting stakeholders, with whom it has built mutually profitable business relationships.
Marketing Opportunity Marketing Opportunity an area of buyer need and interest in which there is a high probability that a company can profitably satisfy that need.
Marketing Plan Marketing Plan written document that summarizes what the mar- keter has learned about the marketplace, indicates how the firm plans to reach its marketing objectives, and helps direct and coordi- nate the marketing effort.
Marketing Public Relations (MPR) Marketing Public Relations (MPR) publicity and other activities that build corporate or product image to facilitate marketing goals.
Marketing Research Marketing Research the systematic design, collection, analysis, and reporting of data and findings relevant to a specific marketing situation facing the company.
Markup Markup pricing an item by adding a standard increase to the product’s cost.
Mass Customization Mass Customization the ability of a company to meet each customer’s requirements
Massively Parallel Processing (MPP) Massively parallel processing (MPP) is a form of collaborative processing of the same program by two or more processors. Each processor handles different threads of the program, and each processor itself has its own operating system and dedicated memory. A messaging interface is required to allow the different processors involved in the MPP to arrange thread handling. Sometimes, an application may be handled by thousands of processors working collaboratively on the application.
Master Brand Master Brand situation in which the parent brand is already associated with multiple products through brand extensions.
Materials And Parts Materials And Parts goods that enter the manufacturer’s product completely.
Maximum Likelihood Estimation Maximum likelihood estimation (MLE) is a technique used for estimating the parameters of a given distribution, using some observed data. For example, if a population is known to follow a normal distribution but the mean and variance are unknown, MLE can be used to estimate them using a limited sample of the population, by finding particular values of the mean and variance so that the observation is the most likely result to have occurred.
Mean The statistical mean refers to the mean or average that is used to derive the central tendency of the data in question. It is determined by adding all the data points in a population and then dividing the total by the number of points. The resulting number is known as the mean or the average.
Mean (Average, Expected Value) A calculation that gives us a sense of a “typical” value for a group of numbers. The mean is the sum of a list of values divided by the number of values in that list. It can be deceiving used on its own, and in practice we use the mean with other statistical values to gain intuition about our data.
Measure A calculated numerical value. It can be a sum, a count, an average, a percentage, etc. Examples of measures would be gross sales, profit, profit percentage. In cubes, many measures are pre-calculated, providing extremely fast performance when analyzing data. Examples of measures in a sales cube may include Sales Amount, Profit YTD, Average Unit Cost, Document Count, etc.
Measurement Protocol The Measurement Protocol allows hits to be sent directly to Google Analytics without needing to use the Google Analytics tracking code or Google Tag Manager. This can be used to send data from any internet-enabled device to Google Analytics. For example, the Measurement Protocol can be used to send data from a point of sale terminal in a store, a self-service kiosk or gaming console.
Media Selection Media Selection finding the most cost-effective media to deliver the desired number and type of exposures to the target audience.
Median The median is one of the three primary ways to find the average of statistical data. It is harder to calculate than the mode, but not as labor intensive as calculating the mean. It is the center in much the same way as finding the center of a line of people. After listing the data values in ascending order, the median is the data value with the same number of data values above it and below it.
Medium Medium is one of the four main dimensions (along with source, campaign and channel) for reporting and analyzing how people found your website. Medium tells you how the message was communicated. For example, ‘organic’ for free search traffic, ‘cpc’ for cost-per-click and ‘referral’ for inbound links from other websites.
Megatrends Megatrends large social, economic, political, and technological changes that are slow to form, and once in place, have an influence for seven to ten years or longer.
Membership Groups Membership Groups groups having a direct influence on a person.
Memory Retrieval Memory Retrieval how and from where information gets out of memory.
Mental Accounting Mental Accounting the manner by which consumers code, categorize, and evaluate financial outcomes of choices.
Metadata Metadata is data about data. In other words, it is data that is used to describe another item's content. The term metadata is often used in the context of Web pages, where it describes page content for a search engine.
Metrics Metrics a method of measuring something, or the results obtained from this: ‘the report provides various metrics at the class and method level.’
Microsales Analysis Microsales Analysis examination of specific products and territories that fail to produce expected sales.
Microsite Microsite a limited area on the Web managed and paid for by an external advertiser/company.
Microsoft BI Stack Microsoft’s suite of business intelligence tools that cater to the individual, team, and enterprise level depending on the needs of the organization. This encompasses SQL servers, OLAP cubes, SharePoint, PerformancePoint, Excel, and Dynamics. In conjunction, these tools can be combined to collect, store, analyze, segment, and visualize data and generate meaningful insights to drive business decisions. Contact Jet Global for a custom demonstration today to see how to easily and quickly make use of the Microsoft BI Stack and bring your data to life.
MIS MIS is the use of information technology, people, and business processes to record, store and process data to produce information that decision makers can use to make day to day decisions. MIS is the acronym for Management Information Systems. In a nutshell, MIS is a collection of systems, hardware, procedures and people that all work together to process, store, and produce information that is useful to the organization.
Mission Statements Mission Statements statements that organizations develop to share with managers, employees, and (in many cases) customers.
ML-as-a-Service (MLaaS) Machine learning as a service (MLaaS) is a range of services that offer machine learning tools as part of cloud computing services, as the name suggests. MLaaS providers offer tools including data visualization, APIs, face recognition, natural language processing, predictive analytics and deep learning. The provider's data centers handle the actual computation.
Mobile Analytics Refers to the accessibility of meaningful and pointed data, dashboards, and reports for end users on mobile devices, such as tablets and smart phones.
Mobile B Mobile BI is software that extends desktop business intelligence applications so they can be used on a mobile device. Phocas is a fully mobile software solution that allows access to your data, dashboards and reports wherever you are on smartphones and tablets.
Mobile Dashboards Business intelligence dashboard that can be accessed via a mobile device. Given the gravitation of business to an always-on, anywhere mentality, business leaders are embracing mobile dashboards and mobile analytics as a means of effectively accessing, analyzing, and optimizing data from mobile devices. Jet Global is an example of a leading mobile dashboard solution.
Mode The mode is a statistical term that refers to the most frequently occurring number found in a set of numbers. The mode is found by collecting and organizing data in order to count the frequency of each result. The result with the highest count of occurrences is the mode of the set, also referred to as the modal value.
Model Selection Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.
MongoDB MongoDB is an open source and NoSQL document-oriented database program. It uses JSON documents to save data structures with an agile scheme known a MongoDB BSON format. It integrates data in applications very quickly and easily.
Monte Carlo Simluation The idea behind Monte Carlo Simulation is to use random samples of parameters or inputs to explore the behavior of a complex process. Monte Carlo simulations sample from a probability distribution for each variable to produce hundreds or thousands of possible outcomes. The results are analyzed to get probabilities of different outcomes occurring.
Motive Motive a need aroused to a sufficient level of intensity to drive us to act.
MPP Massively Parallel Processing, or MPP, refers to the use of a large number of processors to perform a set of coordinated computations in parallel or simultaneously. Spread processing across clusters of servers in order to share the workload.
Multi-Class Classification Problems which have more than one class in the target variable are called multi-class Classification problems. For example, if the target is to predict the quality of a product, which can be Excellent, good, average, fair, bad. In this case, the variable has 5 classes, hence it is a 5-class classification problem.
Multi-Dimensional Database (MDB) A multidimensional database (MDB) is a kind of database which is optimized for OLAP (Online Analytical Processing) applications and data warehousing. MDB can be easily created by using the input of relational database. MDB is the ability of processing data in the database so that results can be developed quickly.
Multi-Value Database A MultiValue database is a flexible database that features a mix of NoSQL and multidimensional database. It is gaining popularity as a flexible and user-friendly database system with no database administrator support requirements. It saves memory, time, disk space and processing time, and does not require administrative skills. A MultiValue database system is closely related to the PICK database designed for the Pick operating system.
Multichannel Marketing Multichannel Marketing a single firm uses two or more marketing channels to reach one or more customer segments.
Multidimensional Databases (MDB) A multidimensional database is a specific type of database that has been optimized for data warehousing and OLAP (online analytical processing). A multi-dimensional database is structured by a combination of data from various sources that work amongst databases simultaneously and that offer networks, hierarchies, arrays, and other data formatting methods. In a multidimensional database, the data is presented to its users through multidimensional arrays, and each individual value of data is contained within a cell which can be accessed by multiple indexes.
Multidimensional expressions (or MDX) A querying language for OLAP or relational databases, with syntax similar to spreadsheet formulae. Due to its simplicity and straightforward syntax, it has quickly become the standard for OLAP systems over the more complex SQL.
Multitasking Multitasking doing two or more things at the same time.
Multivariate Analysis Multivariate analysis is used to study more complex sets of data than what univariate analysis methods can handle. This type of analysis is almost always performed with software (i.e. SPSS or SAS), as working with even the smallest of data sets can be overwhelming by hand.
Multivariate Regression Multivariate Regression is a method used to measure the degree at which more than one independent variable (predictors) and more than one dependent variable (responses), are linearly related. The method is broadly used to predict the behavior of the response variables associated to changes in the predictor variables, once a desired degree of relation has been established.


Top of the Page
Naive Bayes A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects. Naive Bayes classifiers assume strong, or naive, independence between attributes of data points. Popular uses of naive Bayes classifiers include spam filters, text analysis and medical diagnosis. These classifiers are widely.
Naïve Bayes A naive Bayes classifier is an algorithm that uses Bayes' theorem to classify objects. Naive Bayes classifiers assume strong, or naive, independence between attributes of data points. Popular uses of naive Bayes classifiers include spam filters, text analysis and medical diagnosis. These classifiers are widely used for machine learning because they are simple to implement.
NaN Not a Number (NaN) represents an undefined number in floating-point operations. A Not a Number indicator may also be a sign that a variable that is supposed to be a numerical value has been corrupted by text characters or symbols.
Narrow Intelligence The ability to accomplish a very specific (narrow) set of goals, tasks, or objectives such as mastering a video game, or driving a car. AI currently falls within the remit of narrow intelligence.
Natural Language Processing Natural language processing is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data.
Natural language processing (NLP) Natural language processing (NLP) helps computers process, interpret, and analyze human language and its characteristics by using natural language data.
Negative Demand Negative Demand consumers who dislike the product and may even pay to avoid it.
Network Analysis Network analysis is the application of graph/chart theory that is used to categorize, understand, and viewing relationships between the nodes in network terms. It is an effective way of analyzing connections and to check their capabilities in any field such as prediction, marketing analysis, and healthcare etc.
Network Analysis Network analysis is the application of graph/chart theory that is used to categorize, understand, and viewing relationships between the nodes in network terms. It is an effective way of analyzing connections and to check their capabilities in any field such as prediction, marketing analysis, and healthcare etc.
Net price Analysis Net price Analysis analysis that encompasses company list price, average discount, promotional spending, and co-op advertising to arrive at net price.
Neural Network Models inspired by the real-life biology of the brain. These are used to estimate mathematical functions and facilitate different kinds of learning algorithms. Deep Learning is a similar term and is generally seen as a modern buzzword, rebranding the Neural Network paradigm for the modern day.
Neural Networks A machine learning method that’s very loosely based on neural connections in the brain. Neural networks are a system of connected nodes that are segmented into layers — input, output, and hidden layers. The hidden layers (there can be many) are the heavy lifters used to make predictions. Values from one layer are filtered by the connections to the next layer, until the final set of outputs is given and a prediction is made.
Neural Network Modern terminology uses neural networks to refer to artificial neural networks, however a neural network could also be used to define a biological neural circuit comprised of neurons (the structure of which ANNs are based).
NewSQL NewSQL is a type of database language that incorporates and builds on the concepts and principles of Structured Query Language (SQL) and NoSQL languages. By combining the reliability of SQL with the speed and performance of NoSQL, NewSQL provides improved functionality and services.
New User People that visit your website for the first time in the selected date range. Since users are based on the Google Analytics tracking code and browser cookies, it’s important to highlight that people who cleared their cookies or access your website using a different device will be reported as a new user.
New Visitor A small number of reports reference new and returning visitors. A new visitor is reported when someone visits your website for the first time in the selected date range. If there are no existing Google Analytics cookies for a user, then they will be reported as new. Users can be counted as both new and returning if they visit your website multiple times in the date range. The metrics reported for the ‘User Type’ dimension can be different when a session spans two days (over midnight), as one ‘User’ will reported along with two ‘New Users’.
Nominal Variable Nominal variables are categorical variables having two or more categories without any kind of order to them. For example, a column called “name of cities” with values such as Delhi, Mumbai, Chennai, etc. We can see that there is no order between the variables – viz Delhi is in no particular way higher or lower than Mumbai (unless explicitly mentioned)
Non-value-adding Non-value-adding refers to activities within a company or supply chain that do not directly contribute to satisfying end consumers’ requirements. It is useful to think of these as activities that consumers would not be happy to pay for.
Noncompensatory Models Noncompensatory Models in consumer choice, when consumers do not simultaneously consider all positive and negative attribute considerations in making a decision.
Nonexistent Demand Nonexistent Demand consumers who may be unaware of or un- interested in the product.
Normal Distribution Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell curve.
Normalization Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data. It divides larger tables to smaller tables and links them using relationships.
Normalize A set of data is said to be normalized when all of the values have been adjusted to fall within a common range. We normalize data sets to make comparisons easier and more meaningful. For instance, taking movie ratings from a bunch of different websites and adjusting them so they all fall on a scale of 0 to 100.
NoSQL NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, Google that collect terabytes of user data every single day.
NoSQL NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is easy to scale. NoSQL database is used for distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For example, companies like Twitter, Facebook, Google that collect terabytes of user data every single day.
NoSQL (Not ONLY SQL) NoSQL is a class of database management systems (DBMS) that do not follow all of the rules of a relational DBMS and cannot use traditional SQL to query data. The term is somewhat misleading when interpreted as "No SQL," and most translate it as "Not Only SQL," as this type of database is not generally a replacement but, rather, a complementary addition to RDBMSs and SQL.
NoSQL It almost sounds like a protest against ‘SQL (Structured Query Language) which is the bread-and-butter for traditional Relational Database Management Systems (RDBMS) but NOSQL actually stands for Not ONLY SQL :-). NoSQL actually refers to database management systems that are designed to handle large volumes of data that does not have a structure or what’s technically called a ‘schema’ (like relational databases have). NoSQL databases are often well-suited for big data systems because of their flexibility and distributed-first architecture needed for large unstructured databases.
Not Provided In the organic keywords report, not provided indicates that a search engine prevented the individual keyword from being reported. The majority of not provided organic keywords come from Google search results, where anybody performing a search on the secure version of Google (e.g. https://www.google.com) will have their individual organic keyword withheld from analytics tools, including Google Analytics.
Not Set Not set can be seen in a number of different reports and indicates that a particular piece of information is not available within the report. For example, in the Location report, not set indicates that Google Analytics was unable to determine someone’s exact geographic location when they accessed your website. While not set in the Source/Medium report occurs when a campaign tagged URL hasn’t been fully constructed (for example, if ‘source’ isn’t defined it will be displayed as not set within the report).
Null Hypothesis (H0) A null hypothesis (H0) is a stated assumption that there is no difference in parameters (mean, variance, DPMO) for two or more populations. According to the null hypothesis, any observed difference in samples is due to chance or sampling error.
Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.


Top of the Page
Object Databases The database that stores data in the form of objects is known as the object database. These objects are used in the same manner as that of the objects used in OOP. An object database is different from the graph and relational databases. These databases provide a query language most of the time that helps to find the object with a declaration.
Object-based Image Analysis It is the analysis of object-based images that is performed with data taken by selected related pixels, known as image objects or simply objects. It is different from the digital analysis that is done using data from individual pixels.
OLAP Cube A method of storing data in a multidimensional form, generally for reporting purposes. In OLAP cubes, data measures are categorized by dimensions. OLAP cubes are often pre-summarized across dimensions to drastically improve query time, and accuracy, from relational databases.
Omnichannel Omnichannel refers to the growing need for a consistent customer experience across multiple channels. For example, if a customer is shopping for your product, their experience across their tablet, smartphone, desktop or in-store should be relatively the same.
One Hot Encoding One Hot encoding is done usually in the preprocessing step. It is a technique which converts categorical variables to numerical in an interpretable format. In this we create a Boolean column for each category of the variable.
One Sample T-test The one sample t-test is a statistical procedure used to determine whether a sample of observations could have been generated by a process with a specific mean. Suppose you are interested in determining whether an assembly line produces laptop computers that weigh five pounds. To test this hypothesis, you could collect a sample of laptop computers from the assembly line, measure their weights, and compare the sample with a value of five using a one-sample t-test.
One Shot Learning It is a machine learning approach where the model is trained on a single example. One-shot Learning is generally used for object classification. This is performed to design effective classifiers from a single training example.
One version of the truth (or ‘single version of the truth’; or SVOT A technical concept describing the business analysis ideal of having either a single centralized database (data warehouse), or at least a distributed synchronized database, which stores all of an organization’s data in a consistent and non-redundant form. A combination of software, data quality, and strong data leadership can help enterprises and organizations achieve SVOT.
Online Analytical Processing (OLAP) Online analytical processing (OLAP) is a high-level concept that describes a category of tools that aid in the analysis multi-dimentional queries. OLAP came about because of the tremendous complexity and sheer growth associated with business data during the 1970s as the volume and type of information became too heavy for adequate analysis through simple structured query language (SQL) queries.
Online Analytical Processing (OLAP) Online Analytical Processing (OLAP) is computer processing that enables a user to easily and selectively extract and view data from different points of view.
Online Transaction Processing (OLTP) Online Transaction Processing (OLTP) is a class of software programs capable of supporting transaction-oriented applications on the internet.
Online transactional processing (OLTP) Online transaction processing (OLTP) is a class of systems that supports or facilitates high transaction-oriented applications. OLTP’s primary system features are immediate client feedback and high individual transaction volume.
Oozie Apache Oozie is a workflow scheduler for Hadoop. It is a system which runs the workflow of dependent jobs. Here, users are permitted to create Directed Acyclic Graphs of workflows, which can be run in parallel and sequentially in Hadoop.
Open Data Center Alliance (ODCA) OCDA is the combination of IT organizations over the globe. The main goal of this consortium is to increase the movement of cloud computing.
Open Database Connectivity (ODBC) Open Database Connectivity (ODBC) is an open standard application programming interface (API) for accessing a database.
Operational BI Operational business intelligence, sometimes called real-time business intelligence, is an approach to data analysis that enables decisions to be taken based on the real-time data companies generate and use on a day-to-day basis.
Operational Data Store (ODS) An operational data store (ODS) is a type of database that collects data from multiple sources for processing, after which it sends the data to operational systems and data warehouses. It provides a central interface or platform for all operational data used by enterprise systems and applications.
Operational Databases An operational database is a database that is used to manage and store data in real time. An operational database is the source for a data warehouse. Elements in an operational database can be added and removed on the fly. These databases can be either SQL or NoSQL-based, where the latter is geared toward real-time operations.
Operational Reporting Tactical, real-time reporting that reflects and supports day-to-day activity at an organizational level. Examples of operational reporting include bank teller window balancing reports, daily production records, and transaction logs.
Opinion Leader Opinion Leader the person in informal, product-related communications who offers advice or information about a specific product or product category.
Optical Character Recognition (OCR) Optical Character Recognition (OCR) conversion of images of text (typed, handwritten, or printed) either electronically or mechanically, into machine-encoded text.
Optimization Optimization the actions of making the best or most effective use of a situation or resource.
Optimization Analysis The process of finding optimal problem parameters subject to constraints. Optimization algorithms heuristically test a large number of parameter configurations in order to find an optimal result, determined by a characteristic function (also called a fitness function).
Ordering Ease Ordering Ease how easy it is for the customer to place an order with the company.
Ordinal Variable An ordinal variable is a categorical variable for which the possible values are ordered. Ordinal variables can be considered “in between” categorical and quantitative variables. Example: Educational level might be categorized as 1: Elementary school education 2: High school graduate 3: Some college 4: College graduate 5: Graduate degree
Organic Organic refers to people clicking on a free link from a search results page. For example, people clicking through to your website from a free result on a Google search results page.
Organic Search When a visitor originates from a search engine. This includes, but is not limited to, Google, Bing, and Yahoo.
Organizational Buying Organizational Buying the decision-making process by which for- mal organizations establish the need for purchased products and services and identify, evaluate, and choose among alternative brands and suppliers.
Outlier An outlier is a data point that is considered extremely far from other points. They are generally the result of exceptional cases or errors in measurement, and should always be investigated early in a data analysis workflow.
Outlier Detection Outlier detection is the process of detecting and subsequently excluding outliers from a given set of data. An outlier may be defined as a piece of data or observation that deviates drastically from the given norm or average of the data set. An outlier may be caused simply by chance, but it may also indicate measurement error or that the given data set has a heavy-tailed distribution.
Overall Market Overall Market share the company’s sales expressed as a percent- age of total market sales.
Overfitting In statistics and machine learning, overfitting occurs when a model tries to predict a trend in data that is too noisy. Overfitting is the result of an overly complex model with too many parameters. A model that is overfitted is inaccurate because the trend does not reflect the reality of the data.
Overfull Demand Overfull Demand more consumers would like to buy the product than can be satisfied.


Top of the Page
P-Value The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.
Packaging Packaging all the activities of designing and producing the container for a product.
Page The page shows the part of the URL after your domain name (path) when someone has viewed content on your website. For example, if someone views https://www.example.com/contact then /contact will be reported as the page inside the Behavior reports.
Page Value Allows you to understand the impact of your website’s pages in driving value based on ecommerce transactions and goal conversions (where a goal value has been set). Each page that led to a conversion shares the value that was generated by the conversion.
Pages Per Session A top-level metric for user engagement showing the average number of pageviews in each session.
Pageview A pageview is reported when a page has been viewed by a user on your website. In the Google Analytics pages report, by default, your pages are ordered by popularity based on pageviews. This allows you to see which content is being viewed most often.
Paid Search Paid search is a form of digital marketing where search engines such as Google and Bing allow advertisers to show ads on their search engine results pages (SERPs). Paid search works on a pay-per-click model, meaning you do exactly that – until someone clicks on your ad, you don't pay.
Paired T-test The paired sample t-test, sometimes called the dependent sample t-test, is a statistical procedure used to determine whether the mean difference between two sets of observations is zero. In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of observations.
Pandas Pandas is a library kit for the Python programming language that can help manipulate data tables or other key tasks in this type of object-oriented programming environment. Pandas may be useful in the design of certain machine learning and neural network projects or other major innovations where the Python programming language plays a role.
Parallel Data Analysis The process of breaking an analytical problem into small partitions and then running analysis algorithms on each of the partitions simultaneously is known as parallel data analysis. This type of data analysis can be run either on the different systems or on the same system.
Parallel Method Invocation (PMI) Parallel method invocation (PMI) is a computational concept used in programming and Big Data analytics which allows an application to call or invoke functions or methods in parallel, as opposed to one after the other which is the norm in most instances.
Parallel Processing Parallel processing is a method of simultaneously breaking up and running program tasks on multiple microprocessors, thereby reducing processing time. Parallel processing may be accomplished via a computer with two or more processors or via a computer network.
Parallel Query Parallel query is a method used to increase the execution speed of SQL queries by creating multiple query processes that divide the workload of a SQL statement and executing it in parallel or at the same time.
Parameters A parameter is a special kind of variable in computer programming language that is used to pass information between functions or procedures. The actual information passed is called an argument.
Parent Brand Parent Brand an existing brand that gives birth to a brand extension.
Partner Relationship Management (PRM) Partner Relationship Management (PRM) activities the firm un- dertakes to build mutually satisfying long-term relations with key partners such as suppliers, distributors, ad agencies, and marketing research suppliers.
Parts of a Workflow While every workflow is different, these are some of the general processes that data professionals use to derive insights from data.
Pattern Recognition Pattern recognition is the ability to detect arrangements of characteristics or data that yield information about a given system or data set. In a technological context, a pattern might be recurring sequences of data over time that can be used to predict trends, particular configurations of features in images that identify objects, frequent combinations of words and phrases for natural language processing (NLP), or particular clusters of behavior on a network that could indicate an attack -- among almost endless other possibilities.
Penetrated Market Penetrated Market the set of consumers who are buying a company’s product.
Pentaho Pentaho, a software organization, provides open source Business Intelligence products those are known as Pentaho Business Analytics. Pentaho offers OLAP services, data integration, dashboarding, reporting, ETL, and data mining capabilities.
Perceived Value Perceived Value the value promised by the company’s value proposition and perceived by the customer
Percentage of New Sessions Shows the percentage of sessions for people who have not previously been to your website. The metric is calculated by dividing the number of new users by the total number of sessions. For example, if 100 people visited your website for the first time out of a total of 200 sessions, then the percentage of new sessions would be reported as 50%.
Perception Perception the process by which an individual selects, organizes, and interprets information inputs to create a meaningful picture of the world.
Performance Marketing Performance Marketing understanding the financial and nonfinancial returns to business and society from marketing activities and programs.
Performance Quality Performance Quality the level at which the product’s primary characteristics operate.
Personal Communications Channels Personal Communications Channels two or more persons communicating directly face-to-face, person-to-audience, over the telephone, or through e-mail.
Personal Influence Personal Influence the effect one person has on another’s attitude or purchase probability.
Personality Personality a set of distinguishing human psychological traits that lead to relatively consistent responses to environmental stimuli.
Personalization Personalization is the process by which a user customizes a desktop, or Web-based interface, to suit personal preferences.
Personally Identifiable Information (PII) You’ve probably heard this term multiple times before. For clarification, PII refers to any user data that could be used to distinguish one person from another. Standard PII identifiers include phone numbers, email addresses, social security numbers or mailing addresses.
Petabyte The petabyte is a multiple of the unit byte for digital information. The prefix peta indicates the fifth power of 1000 and means 10¹⁵ in the International System of Units, and therefore 1 petabyte is one quadrillion bytes, or 1 thousand billion bytes.
Pie Chart A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents.
Pig A high-level platform for creating programs that run on Hadoop, Apache Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin is the language used for this platform, which can be extended using user-defined functions (UDFs) that the users can write in Java, Python, JavaScript, Ruby, or Groovy.
PII (Personally Identifiable Information) According to the Google Analytics Terms of Service, you are prevented from collecting PII (personally identifiable information) into your reports. This includes email addresses, full names and other personal details. However, according to the Terms of Service you are able to collect IDs that can then be linked to individuals outside of Google Analytics.
Place Advertising Place Advertising (also out-of-home advertising) ads that appear outside of home and where consumers work and play.
Platform Platform a standard for the hardware of a computer system, which determines which kinds of software it can run. For example, Phocas can be delivered via cloud software-as-a-service (SaaS), private cloud or on premise.
Point-of-Purchase (P-O-P) Point-of-Purchase (P-O-P) the location where a purchase is made, typically thought of in terms of a retail setting.
Points-of-Difference (PODs) Points-of-Difference (PODs) attributes or benefits that consumers strongly associate with a brand, positively evaluate, and believe they could not find to the same extent with a competitive brand.
Points-of-Parity (POPs) Points-of-Parity (POPs) attribute or benefit associations that are not necessarily unique to the brand but may in fact be shared with other brands.
Polynomial Regression Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modeled as an nth degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x)
Population A dataset that consists of all the members of some group. Descriptive parameters (such as ?, ?) are used to describe the population.
Portal Portal is a term, generally synonymous with gateway, for a World Wide Web site that is or proposes to be a major starting site for users when they get connected to the Web or that users tend to visit as an anchor site. There are general portals and specialized or niche portals. Some major general portals include Yahoo, Excite, Netscape, Lycos, CNET, Microsoft Network, and America Online's AOL.com.
Position Based This is an Attribution model in Google Analytics that gives 40% of the credit to the first and last interaction and 20% of the credit distributed evenly to the middle interactions.
Positioning Positioning the act of designing a company’s offering and image to occupy a distinctive place in the minds of the target market.
Potential Market Potential Market the set of consumers who profess a sufficient level of interest in a market offer.
Potential Product Potential Product all the possible augmentations and transformations the product or offering might undergo in the future.
Power (1-beta) The ability of a statistical test to detect a real difference when there is one; the probability of correctly rejecting the null hypothesis. Determined by alpha and sample size.
Power BI Microsoft’s flagship business analytics data visualization tool. Power BI includes graphically robust dashboards and advanced analytics on both mobile and desktop devices.
Pre-trained Model A pre-trained model is a model created by someone else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, you use the model trained on other problem as a starting point.
Precision and Recall In pattern recognition, information retrieval and binary classification, precision is the fraction of relevant instances among the retrieved instances, while recall is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.
Predictive Analytics Predictive analytics is a form of advanced analytics, used to make predictions about unknown future events. This is done in several ways – from data mining, statistics, modeling, machine learning to the next generation of artificial intelligence
Predictive Modelling Predictive modeling is a process that uses data mining and probability to forecast outcomes. Each model is made up of a number of predictors, which are variables that are likely to influence future results. Once data has been collected for relevant predictors, a statistical model is formulated. The model may employ a simple linear equation, or it may be a complex neural network, mapped out by sophisticated software. As additional data becomes available, the statistical analysis model is validated or revised.
Predictor Variable Predictor variable is the name given to an independent variable used in regression analyses. The predictor variable provides information on an associated dependent variable regarding a particular outcome. ... The entry concludes with a familiar example of predictor variables used in common applications.
Prescriptive Analytics Still using the credit card transactions example, you may want to find out which spending to target (i.e. food, entertainment, clothing etc.) to make a huge impact on your overall spending. Prescriptive analytics builds on predictive analytics by including ‘actions’ (i.e. reduce food or clothing or entertainment) and analyzing the resulting outcomes to ‘prescribe’ the best category to target to reduce your overall spend. You can extend this to Big Data and imagine how executives can make data-driven decisions by looking at the impacts of various actions in front of them.
Previous Page Path Previous page path is a dimension that allows you to see the page viewed immediately before another page within a session. Previous page path can be useful for reviewing navigation paths people are using between individual pages on your website.
Price Discrimination Price Discrimination a company sells a product or service at two or more prices that do not reflect a proportional difference in costs.
Price Escalation Price Escalation an increase in the price of a product due to added costs of selling it in different countries.
Primary Groups Primary Groups groups with which a person interacts continuously and informally, such as family, friends, neighbors, and coworkers.
Principal Component Analysis (PCA) Principal component analysis (PCA) is a technique used for identification of a smaller number of uncorrelated variables known as principal components from a larger set of data. The technique is widely used to emphasize variation and capture strong patterns in a data set. Invented by Karl Pearson in 1901, principal component analysis is a tool used in predictive models and exploratory data analysis. Principal component analysis is considered a useful statistical method and used in fields such as image compression, face recognition, neuroscience and computer graphics.
Principle of Congruity Principle of Congruity psychological mechanism that states that consumers like to see seemingly related objects as being as similar as possible in their favorability.
Private label Brand Private label Brand brands that retailers and wholesalers develop and market.
Probability Probability is a measure quantifying the likelihood that events will occur. See glossary of probability and statistics. Probability quantifies as a number between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty.
Probability Distribution A statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. Probability distributions may be discrete or continuous.
Product Product anything that can be offered to a market to satisfy a want or need, including physical goods, services, experiences, events, person, places, properties, organizations, information, and ideas.
Product Adaptation Product Adaptation altering the product to meet local conditions or preferences.
Product Assortment Product Assortment the set of all products and items a particular seller offers for sale.
Product Concept Product Concept proposes that consumers favor products offering the most quality, performance, or innovative features.
Product Invention Product Invention creating something new to product development or other means.
Product Map Product Map competitors’ items that are competing against company X’s items.
Product System Product System a group of diverse but related items that function in a compatible manner.
Product-mix Pricing Product-mix Pricing the firm searches for a set of prices that maximizes profits on the total mix.
Product-penetration Percentage Product-penetration Percentage the percentage of ownership or use of a product or service in a population.
Production Concept Production Concept holds that consumers prefer products that are widely available and inexpensive.
Profitable Customer Profitable Customer a person, household, or company that over time yields a revenue stream that exceeds by an acceptable amount the company’s cost stream of attracting, selling, and servicing that customer.
Project Manager An individual who is responsible for the planning, organization, resource management, and discipline pertaining to the successful completion of a specific project or objective. Our experience is that project managers are most effective when they come from the IT organization – technical skills being critical to understanding the level of effort and time demands of data warehousing tasks. Communication and PM skills are critical as well as a comfort level around senior executives.
Properties A property can be any number of defined data points that you are tracking. For example, a customer property could include age, gender, location, company, email, revenue, etc. You can also define specific properties that you’d like to track such as industry, website, number of Twitter followers, etc.
Property Properties are created within a Google Analytics account. Each property represents an instance of the tracking ID used to collect data from a website, group of websites, a mobile app or the Measurement Protocol. Each property will include data sent to the associated tracking ID. Once data has been collected it is processed in the reporting view (or views) created under the property.
Prospect Prospect theory when consumers frame decision alternatives in terms of gains and losses according to a value function.
Psychographics Psychographics the science of using psychology and demographics to better understand consumers.
Public Public any group that has an actual or potential interest in or impact on a company’s ability to achieve its objectives.
Public Data Public data is information that can be freely used, reused and redistributed by anyone with no existing local, national or international legal restrictions on access or usage.
Public Relations (PR) Public Relations (PR) a variety of programs designed to promote or protect a company’s image or its individual products.
Publicity Publicity the task of securing editorial space—as opposed to paid space—in print and broadcast media to promote something.
Pull Strategy Pull Strategy when the manufacturer uses advertising and promotion to persuade consumers to ask intermediaries for the product, thus reducing the intermediaries to order it.
Purchase Probability Scale Purchase Probability Scale a scale to measure the probability of a buyer making a particular purchase.
Pure-Click Pure-Click companies that have launched a Web site without any previous existence as a firm.
Push Strategy Push Strategy when the manufacturer uses its sales force and trade promotion money to induce intermediaries to carry, promote, and sell the product to end users.
Python Python is a multiparadigm, general-purpose, interpreted, high-level programming language. Python allows programmers to use different programming styles to create simple or complex programs, get quicker results and write code almost as if speaking in a human language. Some of the popular systems and applications that have employed Python during development include Google Search, YouTube, BitTorrent, Google App Engine, Eve Online, Maya and iRobot machines.
PyTorch PyTorch is a Python-based scientific computing package that uses the power of graphics processing units. It is also one of the preferred deep learning research platforms built to provide maximum flexibility and speed. It is known for providing two of the most high-level features; namely, tensor computations with strong GPU acceleration support and building deep neural networks on a tape-based autograd systems.


Top of the Page
QphH The TPC-H Composite Query-per-Hour Performance (QphH@Size) is a metric used to reflect multiple aspects of a database system’s ability to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.
Quality Quality the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs.
Quantitiative Analysis This field is highly focused on using alogrithms for to gain an edge in the financial sector. These algorithms either recommend or make trading decisions based on a huge amount of data, often on the order of picoseconds. Quantitative analysts are often called “quants.”
Quantity The number of products purchased in an ecommerce transaction.
Quartile A quartile is a statistical term describing a division of observations into four defined intervals based upon the values of the data and how they compare to the entire set of observations.
Query A query is a request for data or information from a database table or combination of tables. This data may be generated as results returned by Structured Query Language (SQL) or as pictorials, graphs or complex results, e.g., trend analyses from data-mining tools.
Query Analysis The process to perform the analysis of search query is called query analysis. The query analysis is done to optimize the query to get the best possible results.


Top of the Page
R R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
Range In computer programming, range refers to possible variable values or the interval that includes the upper and lower bounds of an array. In statistics, range refers to the interval between points of data. A statistic's strength and meaning correlate with the sample size, whether the range is short or long.
Ranking Ranking the actions or process of giving a specified rank or place within a grading system.
Re-identification The data re-identification is a process that matches anonymous data with the available auxiliary data or information. This practice is helpful to find out the individual whom this data belongs to.
Reactive machines Reactive machines can analyze, perceive, and make predictions about experiences, but do not store data; they react to situations and act based on the given moment.
Real-time Analytics Real-time analytics refer to analytics that are able to be accessed as they come into a system – rather than having to wait for them to be processed. When used for data analytics this means identifying data patterns that provide meaning to a business in the here and now.
Real-time Data Real-time data refers to data that is presented as it is acquired. The idea of real-time data handling is now popular in new technologies such as those that deliver up-to-the-minute information in convenience apps to mobile devices such as phones, laptops and tablets.
Recommendation Engine A recommendation engine is a system that identifies and provides recommended content or digital items for users. As mobile apps and other advances in technology continue to change the way users choose and utilize information, the recommendation engine is becoming an integral part of applications and software products.
Recurrent neural network (RNN) Recurrent neural network (RNN) a type of neural network that makes sense of and creates outputs based on sequential information and pattern recognition.
Reference Data It is the big data term that defines the data used to describe an object along with its properties. The object described by reference data may be virtual or physical in nature.
Reference Groups Reference Groups all the groups that have a direct or indirect influence on a person’s attitudes or behavior.
Reference Prices Reference Prices pricing information a consumer retains in memory that is used to interpret and evaluate a new price.
Referral A referral is reported when a user clicks through to your website from another third-party website. The referrals report allows you to see all of the websites (by domain) that are sending you traffic. You can also drill-down into the referrals report to view the ‘Referral Path’ which allows you to see the individual pages linking to your website.
Referrals Referral marketing is the method of promoting products or services to new customers through referrals, usually word of mouth. Such referrals often happen spontaneously but businesses can influence this through appropriate strategies.
Referrers In its simplest terms, a referrer is any source that sends a new visitor to your website. This could include social media posts, Quora questions, images embedded with links, posts on other blogs, backlinks and more.
Regression Regression is a statistical measurement used in finance, investing, and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables).
Regression Analysis A modelling technique used to define the association between variables. It assumes a one-way causal effect from predictor variables (independent variables) to a response of another variable (dependent variable). Regression can be used to explain the past and predict future events.
Regression Spline Regression Splines is a non-linear approach that uses a combination of linear/polynomial functions to fit the data. In this technique, instead of building one model for the entire dataset, it is divided into multiple bins and a separate model is built on each bin.
Regular Expression (or Regex) An advanced method of pattern matching in text strings. Regular expressions can be used in various places inside Google Analytics including view filters, goals, segments, table filters and more.
Regularization Regularization is a technique used to solve the overfitting problem in statistical models. In machine learning, regularization penalizes the coefficients such that the model generalize better. We have different types of regression techniques which uses regularization such as Ridge regression and lasso regression.
Reinforcement Learning Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.
Relational Database A relational database is a type of database that is set of formally described tables from which data can be accessed or reassembled in different ways. Data in a relational database are usually setup into tables.
Relationship Marketing Relationship Marketing building mutually satisfying long-term relationships with key parties, in order to earn and retain their business.
Relative Market Share Relative Market Share market share in relation to a company’s largest competitor.
Reliability Reliability a measure of the probability that a product will not mal- function or fail within a specified time period.
Repairability Repairability a measure of the ease of fixing a product when it mal- functions or fails.
Report Distribution Refers to the methodology used to circulate relevant information to appropriate stakeholders via manual or automated means. To learn more about how to automate your time-consuming report distribution process, see our whitepaper ‘Intuitive Reporting’.
Representativeness Heuristic Representativeness Heuristic when consumers base their predictions on how representative or similar an outcome is to other examples.
Residual Residual of a value is the difference between the observed value and the predicted value of the quantity of interest. Using the residual values, you can create residual plots which are useful for understanding the model.
Residual (Error) The residual is a measure of how much a real value differs from some stastical value we calculated based on the set of data. So given a prediction that it will be 20 degrees fahrenheit at noon tomorrow, when noon hits and its only 18 degrees, we have an error of 2 degrees. This is often used interchangably with the term “error,” even though, technically, error is a purely theoretical value.
Response Variable Response variables are also known as dependent variables, y-variables, and outcome variables. Typically, you want to determine whether changes in the predictors are associated with changes in the response.
Retailer (or retail store) Retailer (or retail store) any business enterprise whose sales vol- ume comes primarily from retailing.
Retailing Retailing all the activities in selling goods or services directly to final consumers for personal, non business use.
Retargeting Retargeting is a form of advertising that works by tracking users who visit your site, placing a cookie on their device or browser and displaying an ad to that same user as they interact with other sites or applications. Retargeting helps to bring back those visitors who might have been interested at one time but need a little reminder to convert into a paying customer.
Returning Visitor A small number of reports reference returning and new visitors. A returning visitor is reported when someone with existing Google Analytics cookies comes back to your website. Users can be counted as both new and returning if they visit your website multiple times in the date range.
Revenue Sales revenue reported from transactions that have been tracked by Google Analytics. The revenue figures can include shipping and tax depending on the ecommerce tracking code that has been implemented.
Revenue Per User Total revenue divided by the number of users shows the average amount generated for each user.
Revenue Report A revenue report is similar to an attribution report but is solely focused on revenue generated from specific activities. For example, you might want to pull a revenue report if you’re interested in understanding exactly how much revenue your event marketing campaigns have generated.
Ridge Regression Ridge regression is a way to create a parsimonious model when the number of predictor variables in a set exceeds the number of observations, or when a data set has multicollinearity (correlations between predictor variables). Tikhivov’s method is basically the same as ridge regression, except that Tikhonov’s has a larger set. It can produce solutions even when your data set contains a lot of statistical noise (unexplained variation in a sample).
Risk Analysis Risk analysis is the process of identifying and analyzing potential issues that could negatively impact key business initiatives or critical projects in order to help organizations avoid or mitigate those risks.
Robotics Robotics focused on the design and manufacturing of robots that exhibit and/or replicate human intelligence and actions.
ROC-AUC AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. ROC is a probability curve and AUC represents degree or measure of separability. It tells how much model is capable of distinguishing between classes.
Role-based access control (RBAC) Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within an enterprise.
Root Mean Squared Error (RMSE) Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results.
Rotational Invariance In mathematics, a function defined on an inner product space is said to have rotational invariance if its value does not change when arbitrary rotations are applied to its argument.
Routing Analysis It is a process or procedure to find the optimized routing. It is done with the use of various variables for transport to improve efficiency and reduce costs of the fuel.
Ruby Ruby is an open source, object-oriented programming language created by Yukihiro “Matz” Matsumoto. Designed to provide a programming language that focuses on simplicity and productivity, the creation of Ruby drew its inspiration from Lisp, Smalltalk and Perl. Although naturally object-oriented, Ruby can also be applied using procedural and functional programming styles.


Top of the Page
SaaS Software as a service (SaaS) is a model for the distribution of software where customers access software over the Internet. In SaaS, a service provider hosts the application at its data center and a customer accesses it via a standard web browser.
Sales Analysis Sales Analysis measuring and evaluating actual sales in relation to goals.
Sales Budget Sales Budget a conservative estimate of the expected volume of sales, used for making current purchasing, production, and cash flow decisions.
Sales Dashboard One of the most widely utilized examples of business intelligence, sales dashboards collate and present relevant sales data including opportunities, pipelines, lead funnels, revenue, product performance, forecasting, customer profitability, and more. By persistently displaying organized sales data, businesses can leverage their notoriously competitive sales departments to drive customer acquisition. To learn more about sales dashboards and the KPIs that can improve your bottom line, download the Jet Global white paper ‘How to Build KPIs That Actually Drive Revenue.’
Sales Promotion Sales Promotion a collection of incentive tools, mostly short term, designed to stimulate quicker or greater purchase of particular products or services by consumers or the trade.
Sales Quota Sales Quota the sales goal set for a product line, company division, or sales representative.
Sales-Variance Analysis Sales-Variance Analysis a measure of the relative contribution of different factors to a gap in sales performance.
Sample A data set which consists of only a portion of the members from some population. Sample statistics are used to draw inferences about the entire population from the measurements of a sample.
Sampling In order to speed up the processing of reports, a portion of data is used to extrapolate (or estimate) the complete set of data for the report. Sampling occurs when you request specific data in your reports when there are more than 500,000 sessions in the property for the selected date range. The easiest way to reduce sampling is to reduce the selected date range.
Satisfaction Satisfaction a person’s feelings of pleasure or disappointment resulting from comparing a product’s perceived performance or out- come in relation to his or her expectations.
Scala Scala is a modern programming language that incorporates object-oriented and functional language procedures and features. It provides a general mechanism that aims to enhance productivity by reducing program code complexity and length.
Scalability Scalability is an attribute that describes the ability of a process, network, software or organization to grow and manage increased demand. A system, business or software that is described as scalable has an advantage because it is more adaptable to the changing needs or demands of its users or clients.
Scenario Analysis Scenario Analysis developing plausible representations of a firm’s possible future that make different assumptions about forces driving the market and include different uncertainties.
Schema A schema is the structure behind data organization. It is a visual representation of how different table relationships enable the schema’s underlying mission business rules for which the database is created.
Score In HubSpot Attribution Models, each URL or source is given a score based on its value. A high score means that the URL or source drives more conversions whereas a low score means that the URL or source is not driving a lot of conversions. The score is calculated based on the Attribution model you select.
Scorecard A graphical representation of the progress over time of an enterprise, employee, or business unit, toward some specified goal or goals highlighting relevant KPIs. Performance scorecards are widely used in many industries throughout both the public and private sectors.
Search Query The actual term somebody used in a search engine before clicking through to your website. Depending on the report, the terms can be from paid ads (inside the AdWords reports), or from Google organic search results (inside the Search Console reports).
Search Term If your website has an internal search function you can configure the Site Search reports to show the particular terms people are using as they search your website.
Secondary Groups Secondary Groups groups that tend to be more formal and require less interaction than primary groups, such as religious, professional, and trade-union groups.
Segments A segment is a defined portion or section of something larger such as a database, geometric object, or network. The term is used in database management, graphics, and communications. Defining segments within your audience will allow you to better sort and understand your data. Segmentation allows marketers to quickly view and react to people who are taking certain behaviors or that meet specific criteria.
Selective Attention Selective Attention the mental process of screening out certain stimuli while noticing others.
Selective Distortion Selective Distortion the tendency to interpret product information in a way that fits consumer perceptions.
Selective Distribution Selective Distribution the use of more than a few but less than all of the intermediaries who are willing to carry a particular product.
Selective Retention Selective Retention good points about a product that consumers like are remembered and good points about competing products are forgotten.
Self-Referral Referrals coming from your own website are called ‘self-referrals’. This can occur if there is a page (or pages) on your website that doesn’t have the Google Analytics tracking code installed. For example, if a page is missing the tracking code or if your website spans multiple domains. In most cases, you will want to correct the tracking issue to remove (or reduce) the self-referrals. This is because a new session is created when someone clicks from the page (or pages) causing the self-referral.
Selling Concept Selling Concept holds that consumers and businesses, if left alone, won’t buy enough of the organization’s products.
Semi-Structured Data Semi-structured data is data that is neither raw data, nor typed data in a conventional database system. It is structured data, but it is not organized in a rational model, like a table or an object-based graph. A lot of data found on the Web can be described as semi-structured. Data integration especially makes use of semi-structured data.
Semi-Supervised Learning Semi-supervised learning is a method used to enable machines to classify both tangible and intangible objects. The objects the machines need to classify or identify could be as varied as inferring the learning patterns of students from classroom videos to drawing inferences from data theft attempts on servers. To learn and infer about objects, machines are provided labeled, shallow information about various types of data based on which the machines need to learn from large, structured and unstructured data they receive regularly.
Sentiment Analysis Sentiment analysis is a type of data mining that measures the inclination of people’s opinions through natural language processing (NLP), computational linguistics and text analysis, which are used to extract and analyze subjective information from the Web - mostly social media and similar sources. The analyzed data quantifies the general public's sentiments or reactions toward certain products, people or ideas and reveal the contextual polarity of the information.
Served Market Served Market all the buyers who are able and willing to buy a company’s product.
Served Market Share Served Market Share a company’s sales expressed as a percentage of the total sales to its served market.
Server The server is a virtual or physical computer that receives requests related to the software application and thus sends these requests over a network. It is the common big data term used almost in all the big data technologies.
Service Service any act or performance that one party can offer to another that is essentially intangible and does not result in the ownership of anything.
Services Delivery Manager (BI delivery team) Has accountability for overall success of BI project, within a BI delivery team. The Services Delivery Manager is responsible for staffing and managing the vendor delivery team. The SDM also manages communications with the organization – usually working with product sponsor and business drivers to do so. For more on the roles involved in a BI project and how they impact overall data quality, download our white paper, ”A New Twist on Data Governance from Jet Global.”
Sessions The term session (often called visits) refers to the activity a visitor takes on your application or website throughout a given time. The actual amount of time that is attributed to a session varies depending on your analytics solution. For example, a two-hour session could include the projects, purchases, and reports that a user engaged with while on your application. After a period of inactivity, the session ends and a new session will begin as soon as the user comes back to your site or application.
Share Penetration Index Share Penetration Index a comparison of a company’s current market share to its potential market share.
Shopping Goods Shopping Goods goods that the consumer, in the process of selection and purchase, characteristically compares on such bases as suitability, quality, price, and style.
Short-term Memory (STM) Short-term Memory (STM) a temporary repository of information.
Significant Difference The term used to describe the results of a statistical hypothesis test where a difference is too large to be reasonably attributed to chance.
Simple Decay This is an Attribution Report model in HubSpot that gives the six most recent interactions credit for the conversion. For example, if you visited seven pages before converting to a lead, it would give credit to the final six pages you visited. Page seven would get 50% more credit than page six. Page six would get 50% more credit than page five. And so on.
Single-variance Test (Chi-square Test) Compares the variance of one sample of data to a target. Uses the Chi-square distribution.
Singularity The singularity (also referred to as the technologial singularity) is a term initially coined and popularized by inventor Ray Kurzweil and refers to the point at which we experience an intelligence explosion, where the level of self-improving and learning AI overtakes and rapidly leaves behind our ownn, ultimately reaching superintelligence. It’s hypothesized that the singularity is the point we realize we’ve reached (or are imminently going to reach) a level of artificial superintelligence and can expect a subsequent explosion in technological and scientific growth at a runaway rate. This will almost certainly result in unimaginable changes to human civilization.
Site Search Google Analytics can be configured to track people using your website’s internal search function. The site search reports allow you to see the search terms people are using, repeat searches, search categories, the pages people begin searching from and the percentage of sessions that included a search.
Skewness Skewness is asymmetry in a statistical distribution, in which the curve appears distorted or skewed either to the left or to the right. Skewness can be quantified to define the extent to which a distribution differs from a normal distribution.
Slice and dice Slice and dice divide a quantity of information up into smaller parts, especially in order to analyse it more closely or in different ways.
Slowly Changing Dimensions (SCD) Slowly Changing Dimensions (SCD) Refers to data dimensions that change slowly and unpredictably, rather than on a static or fixed schedule.
Smart Goals If you’re unable to manually configure your own goals, then you can make use of Google’s machine learning to identify sessions that are most likely to result in a conversion.
SMOTE SMOTE stands for Synthetic Minority Oversampling Technique. This is a statistical technique for increasing the number of cases in your dataset in a balanced way. The module works by generating new instances from existing minority cases that you supply as input.
Snapshot Snapshot a brief look or summary. A record of the contents of a storage locations or data file at a given time.
Snowflake Schema Snowflake Schema An arrangement of tables in a multidimensional database such that the physical model resembles a snowflake shape. The snowflake schema consists of centralized fact tables which are connected to multiple dimensions.
Social Social appears as a marketing channel (in the default channel grouping) in the Acquisition reports which automatically includes traffic coming from social media, including Twitter and Facebook. The Acquisition reports also include a dedicated set of social reports to further analyze and report on the performance of your inbound social traffic.
Social Classes Social Classes homogeneous and enduring divisions in a society, which are hierarchically ordered and whose members share similar values, interests, and behavior.
Social Marketing Social Marketing marketing done by a nonprofit or government organization to further a cause, such as “say no to drugs.”
Social Plugins Google Analytics can be configured to track people engaging with social sharing widgets embedded within your website. The social plugins report then allows you to report on the pages people are on when they use your social sharing widgets, the social networks they use and the actions they’ve taken.
Software as a Service (SaaS) Software as a service (SaaS) is a model for the distribution of software where customers access software over the Internet. In SaaS, a service provider hosts the application at its data center and a customer accesses it via a standard web browser.
Sources A source can be any offline or online channel that drives traffic or lead generation. Like referrers, sources can include search engines, social media, blog posts, etc. Unlike referrers, sources may also include specific campaigns. For example, an offline direct mail campaign could be a source that is equally important to monitor and measure within your analytics.
Spark (Apache Spark)Spark Apache Spark is an open source parallel processing framework for running large-scale data analytics applications across clustered computers. It can handle both batch and real-time analytics and data processing workloads.
Spatial Analysis The analysis of spatial data i.e. topological and geographic data is known as spatial analysis. This analysis helps to identify and understand everything about a particular area or position.
Spatial-Temporal Reasoning Spatial-temporal reasoning is an area of artificial intelligence which draws from the fields of computer science, cognitive science, and cognitive psychology. Spatial-temporal reasoning is the ability to mentally move objects in space and time to solve multi-step problems. Three important things about Spatial-temporal reasoning are: 1. It connects to mathematics at all levels, from kindergarten to calculus 2. It is innate in humans 3. Spatial-temporal reasoning abilities can be increased. This understanding of Spatial-temporal reasoning forms the foundation of Spatial-temporal Math.
Specialty Goods Specialty Goods goods with unique characteristics or brand iden- tification for which enough buyers are willing to make a special pur- chasing effort.
Sponsorship Sponsorship financial support of an event or activity in return for recognition and acknowledgment as the sponsor.
SQL (Structured Query Language) Structured Query Language (SQL) is a standard computer language for relational database management and data manipulation. SQL is used to query, insert, update and modify data. Most relational databases support SQL, which is an added benefit for database administrators (DBAs), as they are often required to support databases across several different platforms.
SQL Server Analysis Services (SSAS) Microsoft SQL Server Analysis Services (or SSAS) is an OLAP data mining and reporting tool in Microsoft SQL server. SSAS is used to analyze, access and present information spread across multiple databases or in disparate tables. See also Analysis Services.
SQL Server Integration Services (SSIS) An enterprise data integration, transformation, and migration tool that is built into Microsoft’s SQL Server. It is used for a variety of integration-related tasks, such as analyzing and cleansing data or running ETL processes to update data warehouses. SSIS can consolidate data from multiple relational databases as well as sources such as XML data files and flat files.
Sqoop Apache Sqoop ("SQL to Hadoop") is a Java-based, console-mode application designed for transferring bulk data between Apache Hadoop and non-Hadoop datastores, such as relational databases, NoSQL databases and data warehouses. Version 1.4.4 was released on July 31, 2013.
Stakeholder-Performance Scorecard Stakeholder-Performance Scorecard a measure to track the satisfaction of various constituencies who have a critical interest in and impact on the company’s performance.
Standard Deviation The standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance. It is calculated as the square root of variance by determining the variation between each data point relative to the mean. If the data points are further from the mean, there is a higher deviation within the data set; thus, the more spread out the data, the higher the standard deviation.
Standard error A standard error is the standard deviation of the sampling distribution of a statistic. The standard error is a statistical term that measures the accuracy of which a sample represents a population. In statistics, a sample mean deviates from the actual mean of a population this deviation is known as standard error.
Standardization Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments.
Star Schema The simplest style of data mart schema and most common approach to developing data warehouses and dimensional data marts. The star schema gets its name from the physical model’s resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star’s points.
Startup Program Startup Programs are specifically designed to help young startup companies. For example, Exasol’s startup program is specifically designed to help young, innovative companies in the area of big data and analytics.
Statistic vs. Statistics Statistics (plural) is the entire set of tools and methods used to analyze a set of data. A statistic (singular) is a value that we calculate or infer from data. We get the median (a statistic) of a set of numbers by using techniques from the field of statistics.
Statistical Significance A result is stasticially significant when we judge that it probably didn’t happen due to chance. It is highly used in surveys and statistical studies, though not always an indication of pratical value. The mathematical details of statistical significance are beyond the scope of this post.
Statistical Tools There are a number of statistics data professionals use to reason and communicate information about their data. These are some of the most basic and vital statistical tools to help you get started.
Statistics Statistics is the discipline that concerns the collection, organization, displaying, analysis, interpretation and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied.
Stochastic Gradient Descent Stochastic Gradient Descent is a type of gradient descent algorithm where we take a sample of data while computing the gradient. The update to the coefficients is performed for each training instance, rather than at the end of the batch of instances. The learning can be much faster with stochastic gradient descent for very large training datasets and often one only need a small number of passes through the dataset to reach a good or good enough set of coefficients.
Stored Procedure A batch of SQL statements that can be used to perform a specific task and shared across a network by several clients using different input data. Stored procedures reduce network traffic, increase performance, and protect against SQL injections.
Storm Apache Storm is an open-source Apache tool used to process unbound streams of data. Providing real-time data processing solutions, Storm provides a topology to control data transfers, which is a critical part of routing data where it needs to go for analytics and other operations. It coordinates with other kinds of Apache tools such as database or scaling resources.
Straight Extension Straight Extension introducing a product in a foreign market with- out any change in the product.
Strategic Brand Management Strategic Brand Management the design and implementation of marketing activities and programs to build, measure, and manage brands to maximize their value.
Strategic Business Units (SBUs) Strategic Business Units (SBUs) a single business or collection of related businesses that can be planned separately from the rest of the company, with its own set of competitors and a manager who is responsible for strategic planning and profit performance.
Strategic Group Strategic Group firms pursuing the same strategy directed to the same target market.
Strategic Marketing Plan Strategic Marketing Plan laying out the target markets and the value proposition that will be offered, based on analysis of the best market opportunities.
Strategy Strategy a company’s game plan for achieving its goals.
Stream Processing Stream processing is designed to act on real-time and streaming data with “continuous” queries. Combined with streaming analytics (i.e. the ability to continuously calculate mathematical or statistical analytics on the fly within the stream), stream processing solutions are designed to handle high volumes in real time.
Strong AI Strong artificial intelligence (strong AI) is an artificial intelligence construct that has mental capabilities and functions that mimic the human brain. In the philosophy of strong AI, there is no essential difference between the piece of software, which is the AI, exactly emulating the actions of the human brain, and actions of a human being, including its power of understanding and even its consciousness.
Structured Data Structured data is far easier for Big Data programs to digest, while the myriad formats of unstructured data creates a greater challenge. Yet both types of data play a key role in effective data analysis.
Structured Query Language (SQL) SQL is a standard programming language that is used to retrieve and manage data in a relational database. This language is very useful to create and query relational databases.
Structured v Unstructured Data This is one of the ‘V’s of Big Data i.e.Variety. Structured data is basically anything than can be put into relational databases and organized in such a way that it relates to other data via tables. Unstructured data is everything that can’t – email messages, social media posts and recorded human speech etc.
Sub-Brand Sub-Brand a new brand combined with an existing brand.
Subculture Subculture Subdivisions of a culture that provide more specific identification and socialization, such as nationalities, religions, racial groups, and geographical regions.
Subject Matter Expert (SME) A person who is an authority in a particular area or topic. In some cases, this person can become the go-to resource for ad hoc analysis. The number of subject matter experts your implementation team needs will be determined by the number of user groups from within the organization that will be using the BI tool (finance, operations, sales, HR, etc.).
Subliminal Perception Subliminal Perception receiving and processing subconscious messages that affect behavior.
Sum of Squares In ANOVA, the total sum of squares helps express the total variation that can be attributed to various factors. From the ANOVA table, %SS is the Sum of Squares of the Factor divided by the Sum of Squares Total. Similar to R2 in Regression.
Summary Statistics Summary statistics are the measures we use to communicate insights about our data in a simple way. Examples of summary statistics are the mean, median and standard deviation.
Superintelligence When an AI reaches a level of general intelligence that massively exceeds our own. This new level of intelligence is known as superintelligence.
Supervised Learning Supervised learning, in the context of artificial intelligence (AI) and machine learning, is a type of system in which both input and desired output data are provided. Input and output data are labelled for classification to provide a learning basis for future data processing.
Supervised learning Supervised learning a type of machine learning where output datasets teach machines to generate desired outcomes or algorithms (akin to a teacher-student relationship)
Supervised Machine Learning With supervised learning techniques, the data scientist gives the computer a well-defined set of data. All of the columns are labelled and the computer knows exactly what it’s looking for. It’s similar to a professor handing you a syllabus and telling you what to expect on the final.
Supplies and Business Services Supplies and Business Services short-term goods and services that facilitate developing or managing the finished product.
Supply Chain Management (SCM) Supply Chain Management (SCM) is the oversight of materials, information, and finances as they move in a process from supplier to manufacturer to wholesaler to retailer to consumer.
Supply-side Methods Supply-side Methods approximating the amount of time or space devoted to media coverage of an event, for example, the number of seconds the brand is clearly visible on a television screen or the col- umn inches of press clippings that mention it.
Surrogate key Within a database, the surrogate key is a unique identifier system wide. The value of a surrogate key generally has many attributes including never being manipulated by a system or a user, contains no semantic meaning, and not being composed of multiple values. The surrogate key is not derived from application data, unlike a natural (or business) key which is derived from application data. Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making the database more agile) and guarantees uniqueness.
SVM A support vector machine (SVM) is machine learning algorithm that analyzes data for classification and regression analysis. SVM is a supervised learning method that looks at data and sorts it into one of two categories. An SVM outputs a map of the sorted data with the margins between the two as far apart as possible. SVMs are used in text categorization, image classification, handwriting recognition and in the sciences. A support vector machine is also known as a support vector network (SVN).


Top of the Page
T-Test A t-test’s statistical significance indicates whether or not the difference between two groups’ averages most likely reflects a “real” difference in the population from which the groups were sampled.
Table Relations Are a key component in data set building by matching common fields in tables that are related. To ensure data accuracy and limit redundancy, data is split into subject-based tables so that each fact is represented only once; table relationships are then defined and common fields mapped to paint a complete picture.
Tactical Marketing Plan Tactical Marketing Plan marketing tactics, including product features, promotion, merchandising, pricing, sales channels, and service.
Target Costing Target Costing deducting the desired profit margin from the price at which a product will sell, given its appeal and competitors’ prices.
Target Market Target Market the part of the qualified available market the company decides to pursue.
Target-Return Pricing Target-Return Pricing determining the price that would yield the firm’s target rate of return on investment (ROI).
Taxonomy With so much data, it can be difficult to break down and analyze the information you receive to extract the most meaningful insights possible. Taxonomy is a way of organizing your data into categories and subcategories to allow for greater segmentation and filtering. For example, if you’re tracking blog post engagement within Wordpress, the taxonomies you could include for additional filtering might include comments, searches, article views and blog subscriptions.
Technology-enabled relationship management (TERM) Technology-enabled relationship management (TERM) the concept of forming one enterprise-wide view of the customer across all customer contact channels (i.e., sales, marketing, and customer service and support). It is a complex area, requiring complex solutions to problems of integration, data flow, data access and marketing strategy. A critical component is the database that serves as the customer information repository.
Telemarketing Telemarketing the use of telephone and call centers to attract prospects, sell to existing customers, and provide service by taking orders and answering questions.
TensorFlow TensorFlow is a free software library focused on machine learning created by Google. Initially released as part of the Apache 2.0 open-source license, TensorFlow was originally developed by engineers and researchers of the Google Brain Team, mainly for internal use. TensorFlow is considered the successor of the closed-source application DistBelief and is presently used by Google for research and production purposes. TensorFlow is considered the first serious implementation of a framework focused on deep learning.
Terabyte A terabyte (TB) is a unit of digital information storage used to denote the size of data. It is equivalent to 1,000 gigabytes, or 1,000,000,000,000 bytes, using the SI standard.
Test for Equal Variance (F-test) In statistics, an F-test of equality of variances is a test for the null hypothesis that two normal populations have the same variance. ... This particular situation is of importance in mathematical statistics since it provides a basic exemplar case in which the F-distribution can be derived.
Test Statistic A standardized value (Z, t, F, etc.) which represents the likelihood of H0 and is distributed in a known manner such that the probability for this value can be determined.
Text Analytics Text analytics is a general practice of applying algorithms or programs to text in order to analyze that text. Text analytics are also known as text mining.
Thrift It is a software framework that is used for the development of the ascendable cross-language services. It integrates code generation engine with the software stack to develop services that can work seamlessly and efficiently between different programming languages such as Ruby, Java, PHP, C++, Python, C# and others.
Time Decay This is an Attribution Model in Google Analytics that gives the touchpoints that were closest in time to the conversion to get more credit. In HubSpot, the most similar model is Simple Decay, but this doesn't take time into consideration.
Time on Page The time someone goes to the next page minus the time a visitor originally came to the page. This metric is calculated in Google Analytics.
Time on Site The average amount of time a visitor spends on your site within a certain time period. Many marketers use this metric to get an idea of the effectiveness of their website. The longer someone spends on your website, the more effective your website probably is. This metric is calculated in Google Analytics.
Time Series A time series is a set of data that is ordered by when each data point occurred. Think of stock market prices over the course of a month, or the temperature throughout a day.
Time Series Analysis Time Series is a sequence of well-defined data points measured at consistent time intervals over a period of time. Data collected on an ad-hoc basis or irregularly does not form a time series. Time series analysis is the use of statistical methods to analyze time series data and extract meaningful statistics and characteristics about the data.
Tokenization Tokenization is the process of splitting a text string into units called tokens. The tokens may be words or a group of words. It is a crucial step in Natural Language Processing.
Topological Data Analysis Analysis techniques focusing on the theoretical shape of complex data with the intent of identifying clusters and other statistically significant trends that may be present.
Torch Torch is an open source machine learning library, based on the Lua programming language. It provides a wide range of algorithms for deep learning.
Total Costs Total Costs the sum of the fixed and variable costs for any given level of production.
Total Customer Benefit Total Customer Benefit the perceived monetary value of the bun- dle of economic, functional, and psychological benefits customers expect from a given market offering because of the product, service, people, and image.
Total Customer Cost Total Customer Cost the bundle of costs customers expect to incur in evaluating, obtaining, using, and disposing of the given market offering, including monetary, time, energy, and psychic costs.
Total Customer Value Total Customer Value the perceived monetary value of the bundle of economic, functional, and psychological benefits customers expect from a given market offering.
Total Market Potential Total Market Potential the maximum sales available to all firms in an industry during a given period, under a given level of industry marketing effort and environmental conditions.
Total Quality Management Total Quality Management an organization-wide approach to continuously improving the quality of all the organization’s processes, products, and services.
Touchpoints Touchpoints are all of the various interactions that a user has with your business along the path to conversion. This could include anything from their first email response to the live chat conversation they have with a support representative. By tracking the different touchpoints a visitor makes along the path to conversion, marketers gain a better understand of which touchpoints are most valuable and how many it takes to convert an interested visitor into a paying customer.
TPC-H Benchmark A TPC-H Benchmark is a transaction processing and database benchmark specific to decision support – i.e. analytics, run and managed by the Transaction Processing Performance Council. Exasol holds the number one position in the TPC-H benchmark for both raw performance and price performance on data volumes ranging from 300GB through to 100TB.
Tracking ID In order to send hits to the appropriate property inside Google Analytics, a tracking ID is included in the tracking code (or Google Tag Manager tag). The tracking ID starts with ‘UA’, followed by a series of numbers, for example, UA-123456-1. The number between the dashes is a unique identifier for the Google Analytics account and the number at the end identifies a property within the account.
Tracking Studies Tracking Studies collecting information from consumers on a routine basis over time.
Tracking URL A tracking URL is a regular URL with a token or UTM parameter (learn about UTM parameters below) assigned to it. Tracking URLs allow marketers to track where specific traffic originated.
Training and Testing This is part of the machine learning workflow. When making a predictive model, you first offer it a set of training data so it can build understanding. Then you pass the model a test set, where it applies its understanding and tries to predict a target value.
Transaction Transaction a trade of values between two or more parties: A gives X to B and receives Y in return
Transactional Data Transactional data are information directly derived as a result of transactions. Unlike other sorts of data, transactional data contains a time dimension which means that there is timeliness to it and over time, it becomes less relevant.
Transactions Per User The number of transactions divided by the number of users. This metric can provide insights into how well your website is performing based on ecommerce transactions.
Transfer Transfer in the case of gifts, subsidies, and charitable contributions: A gives X to B but does not receive anything tangible in return.
Transfer Learning Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks.
Transfer Price Transfer Price the price a company charges another unit in the company for goods it ships to foreign subsidiaries.
Transformational Appeal Transformational Appeal elaborates on a nonproduct-related benefit or image.
Trend Trend a direction or sequence of events that has some momentum and durability.
True Negative These are the points which are actually false and we have predicted them false. For example, consider an example where we have to predict whether the loan will be approved or not. Y represents that loan will be approved, whereas N represents that loan will not be approved. So, here the True negative will be the number of classes which are actually N and we have predicted them N as well.
True Positive These are the points which are actually true and we have predicted them true. For example, consider an example where we have to predict whether the loan will be approved or not. Y represents that loan will be approved, whereas N represents that loan will not be approved. So, here the True positive will be the number of classes which are actually Y and we have predicted them Y as well.
Turing Test Turing Test a test created by computer scientist Alan Turing (1950) to see if machines could exhibit intelligence equal to or indistinguishable from that of a human
Two Sample t-test The two-sample t-test is one of the most commonly used hypothesis tests in Six Sigma work. It is applied to compare whether the average difference between two groups is really significant or if it is due instead to random chance. It helps to answer questions like whether the average success rate is higher after implementing a new sales tool than before or whether the test results of patients who received a drug are better than test results of those who received a placebo.
Tying Agreements Tying Agreements agreement in which producers of strong brands sell their products to dealers only if dealers purchase related products or services, such as other products in the brand line.
Type I error When the null hypothesis is true and you reject it, you make a type I error. The probability of making a type I error is α, which is the level of significance you set for your hypothesis test. An α of 0.05 indicates that you are willing to accept a 5% chance that you are wrong when you reject the null hypothesis. To lower this risk, you must use a lower value for α. However, using a lower value for alpha means that you will be less likely to detect a true difference if one really exists.
Type II error When the null hypothesis is false and you fail to reject it, you make a type II error. The probability of making a type II error is β, which depends on the power of the test. You can decrease your risk of committing a type II error by ensuring your test has enough power. You can do this by ensuring your sample size is large enough to detect a practical difference when one truly exists.


UDFs User Defined Functions, or UDF, define functions that perform specific tasks within a larger system. Often used in SQL databases, UDFs provide a mechanism for extending the functionality of the database server by adding a function that can be evaluated in SQL statements.
Underfitting Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. It refers to a model that can neither model on the training data nor generalize to new data. An underfit model is not a suitable model as it will have poor performance on the training data.
Unfriendly Artificial Intelligence Whether or not a general AI is aligned with our values and can therefore be considered ‘human friendly’ is a key concern when it comes to the ethical implications and existential risks studied in the field of AI development. An unfriendly AI is considered a very possible threat if safeguards aren’t put in place to prevent the development of AI that pursues goals misaligned with our own. See also friendly AI.
Uniform Resource Locator (URL) Uniform Resource Locator (URL) is the character string that identifies an Internet document’s exact name and location.
Unique Pageview Counts a page once even if it was viewed multiple times within a single session. For example, if someone landed on your homepage, then viewed the ‘about us’ page and then navigated back to your homepage, the homepage would have one unique pageview (even though the page was viewed twice during the session).
Unique Visitors Anyone who has accessed your website at any point in time is as a unique visitor. This is tracked by a cookie placed on the browser or device of a visitor along with their associated IP address. As long as the visitor comes back to your site from the same browser and device, they're counted as one unique visit. If, however, a visitor clears their cookies or visits your site through a different browser, they're then counted as two unique visitors.
Univariate Analysis Univariate analysis is perhaps the simplest form of statistical analysis. Like other forms of statistics, it can be inferential or descriptive. The key fact is that only one variable is involved. Univariate analysis can yield misleading results in cases in which multivariate analysis is more appropriate.
Unstructured Data The data for which structure can’t be defined is known as unstructured data. It becomes difficult to process and manage unstructured data. The common examples of unstructured data are the text entered in email messages and data sources with texts, images, and videos.
Unsupervised Machine Learning Unsupervised learning is a method used to enable machines to classify both tangible and intangible objects without providing the machines any prior information about the objects. The things machines need to classify are varied, such as customer purchasing habits, behavioral patterns of bacteria and hacker attacks. The main idea behind unsupervised learning is to expose the machines to large volumes of varied data and allow it to learn and infer from the data. However, the machines must first be programmed to learn from data.
Unwholesome Demand Unwholesome Demand consumers may be attracted to products that have undesirable social consequences.
URL Builder The URL Builder is a tool provided by Google that allows you to add campaign tags to your inbound URLs. It's especially helpful if you're just getting started with campaign tags as it provides a visual interface.
User An individual person browsing your website (technically, a unique browser cookie). Each user can visit your website multiple times, for example, one user could create three sessions on your website, with each session containing multiple pageviews. By default, each unique browser cookie will be counted as a separate user which means someone visiting your website on multiple devices (each with their own browser cookie) will mean more than one user is reported. The user ID feature allows you to track unique individuals that identify themselves on multiple devices.
User Explorer The User Explorer report allows you to view the cookie IDs that have been created in people’s browsers. This allows you to see how people interact with your website across multiple sessions.
User ID A unique identifier used to combine sessions from a known person on your website. When you can identify someone (for example, using a ID from your CRM or another system) you can send an ID to Google Analytics to enable a special set of cross-device reports. While this provides a more accurate user count, since someone needs to be identified (for example, by logging into your website), only a portion of your users will be included in these reports.
User ID Coverage When you create a dedicated User ID view in Google Analytics, the User ID Coverage report becomes available in the standard reporting views. The report shows you the percentage of users that are associated with an ID compared to those
User Timings You can report on custom time intervals with the User Timings feature. This can be used to report on the loading time of custom elements on your website, like AJAX, or to report on any custom interval, like the time needed to complete an application form. To use the reports, you will need to modify your implementation to send the custom user timings to Google Analytics.
Users Flow The Users Flow report is a visual representation of how users navigate and interact with your website. For example, you can see the paths people take as they view the content on your website after they land.
UTM Parameters Also known at UTM codes or UTM tags, UTM parameters are essentially source descriptions that are added to the end of a URL. These tags allow marketers to identify the exact source of traffic coming to their website and tie activity on specific channels to business results. Without a UTM tag, you might be able to identify that a visitor came to your site from Twitter, but you wouldn’t know what post or campaign drove that traffic. With a UTM tag, you can include the source (i.e. Twitter), the medium (i.e. email), the Content (i.e. “Why Marketing is Awesome”) and the keyword associated with that campaign for clear attribution.
UTM Tag UTM tags are the individual query parameters used to make up a campaign tagged URL. The UTM tags include utm_name, utm_source, utm_medium, utm_term, utm_content and the lesser known utm_id. UTM stands for 'Urchin Traffic Monitor' (Urchin was the precursor to Google Analytics).


Top of the Page
Value Volume is a 3 V's framework component used to define the size of big data that is stored and managed by an organization. It evaluates the massive amount of data in data stores and concerns related to its scalability, accessibility and manageability.
Value Chain Value Chain a tool for identifying ways to create more customer value.
Value Network Value Network a system of partnerships and alliances that a firm creates to source, augment, and deliver its offerings.
Value Pricing Value Pricing winning loyal customers by charging a fairly low price for a high-quality offering.
Value Proposition Value Proposition the whole cluster of benefits the company promises to deliver.
Value-delivery Network (supply chain) Value-delivery Network (supply chain) a company’s supply chain and how it partners with specific suppliers and distributors to make products and bring them to markets.
Value-Delivery System Value-Delivery System all the expectancies the customer will have on the way to obtaining and using the offering.
Variance Variance (σ2) in statistics is a measurement of the spread between numbers in a data set. That is, it measures how far each number in the set is from the mean and therefore from every other number in the set.
Variety Variety is a 3 V's framework component that is used to define the different data types, categories and associated management of a big data repository. Variety provides insight into the uniqueness of different classes of big data and how they are compared with other types of data.
Velocity The speed at which data is acquired and used. Not only are companies and organizations collecting more and more data at a faster rate, they want to derive meaning from that data as soon as possible, often in real time.
Venture Team Venture Team a cross-functional group charged with developing a specific product or business.
Veracity Data veracity is the degree to which data is accurate, precise and trusted. Data is often viewed as certain and reliable. The reality of problem spaces, data sets and operational environments is that data is often uncertain, imprecise and difficult to trust. The following are illustrative examples of data veracity.
Vertical Integration Vertical Integration situation in which manufacturers try to control or own their suppliers, distributors, or other intermediaries.
Vertical Marketing System (VMS) Vertical Marketing System (VMS) producer, wholesaler(s), and re- tailer(s) acting as a unified system.
View The result set of a stored query on the data, which the database users can probe just as they would in a persistent database connection object. This pre-established query command is kept in the database dictionary.
Viral Marketing Viral Marketing using the Internet to create word-of-mouth effects to support marketing efforts and goals.
Virtual Machine A virtual machine (VM) is an operating system or application environment that is installed on software which imitates dedicated hardware. The end user has the same experience on a virtual machine as they would have on dedicated hardware. It is a fundamental part of many modern cloud infrastructures, the other component is called the Hypervisor, which is a piece of software that creates and runs the virtual machines.
Visit-to-Contact Conversion Rate The number of new contacts divided by the number of visits for the selected time period.
Visit-to-Customer Conversion Rate The number of customers divided by the number of visits for the selected time period.
Visitor Flow Path people take when on your website.
Visitors A visitor is an Internet user who comes to your website or mobile site. By comparing the number of visitors and the number of visits to your site, you can determine whether people visit your site several times throughout the day.
Visits Any time a visitor reaches your site from an outside domain. A visit will end in HubSpot when someone leaves your domain by visiting an external site or closing his or her browser. A visit will end in Google Analytics after a user is inactive for 30 minutes or more.
Visual Analytics Visual Analytics is a way of doing complex data analytics and representing the findings in an eye-catching and visually understandable way.
Visual Studio Microsoft Visual Studio is an integrated development environment (IDE). It is used to develop Windows programs, as well as web sites, web applications, and web services. It is also used to create local database files for SQL.
Visualization Visualization the representation of an object, situation, or set of information as a chart or other image. Phocas visualizations presents your findings as a striking chart while always having the option to drill down into the data behind the image.


Top of the Page
Warranties Warranties formal statements of expected product performance by the manufacturer.
Weak AI Weak AI, or Narrow AI, is a machine intelligence that is limited to a specific or narrow area. Weak Artificial Intelligence (AI) simulates human cognition and benefits mankind by automating time-consuming tasks and by analyzing data in ways that humans sometimes can’t.
Weather Data The data trends and patterns that help to track the atmosphere is known as the weather data. This data basically consists of numbers and factors. Now, real-time data is available that can be used by the organizations in a different manner. Such as a logistics company uses weather data in order to optimize goods transportation.
Web Analytics Web Analytics is the measurement, compilation, analysis and reporting of web data, such as analyzing web server logs and monitoring website visitor behaviour (click analytics). It is used as a tool for business and market research as well as assessing and improving the effectiveness of a website.
Web Scraping Web scraping is the process of pulling data from a website’s source code. It generally involves writing a script that will identify the information a user wants and pull it into a new file for later analysis.
WebHDFS Apache Hadoop WebHDFS is a protocol to access HDFS to make the use of industry RESTful mechanism. It contains native libraries and thus allows to have an access of the HDFS. It helps users to connect to the HDFS from outside by taking advantage of Hadoop cluster parallelism. It also offers the access of web services strategically to all Hadoop components.
Wholesaling Wholesaling all the activities in selling goods or services to those who buy for resale or business use.


Top of the Page
XML Databases An XML database is a database that stores data in XML format. This type of database is suited for businesses with data in XML format and for situations where XML storage is a practical way to archive data, metadata and other digital resources.


Top of the Page
YARN Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster management technology. It is a resource-management platform responsible for managing computing resources in clusters and using them for scheduling of users’ applications. It allows Hadoop to do more than just MapReduce data processing jobs. YARN wasn’t part of the first versions of Hadoop, it was developed separately as a way of extending the functionality of Hadoop. It was announced in 2011 but only finally became production ready with the official Hadoop v2.x release in October 2013.
Yield Pricing Yield Pricing situation in which companies offer (1) discounted but limited early purchases, (2) higher-priced late purchases, and (3) the lowest rates on unsold inventory just before it expires.
Yottabyte A yottabyte (YB) is a unit of digital information storage used to denote the size of data. It is equivalent to a quadrillion gigabytes, 1,000 zettabytes or 1,000,000,000,000,000,000,000,000 bytes.


Top of the Page
Z-test A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples.
Zero Latency Enterprise (ZLE) Zero Latency Enterprise (ZLE) firm that can respond to internal and external events as they occur because information is exchanged across departmental or divisional boundaries without any delay.
Zero-Level Channel (direct-marketing channel) Zero-Level Channel (direct-marketing channel) a manufacturer selling directly to the final customer.
Zettabyte A zettabyte (ZB) is a unit of digital information storage used to denote the size of data. It is equivalent to 1,024 exabytes or 1,000,000,000,000,000,000,000 bytes.
Zookeeper Zookeeper is a top-level software developed by Apache that acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems. Zookeeper keeps track of status of the Kafka cluster nodes and it also keeps track of Kafka topics, partitions etc.