Open
Glossary
- Aggregation bias
- Aggregation bias occurs when incorrect inferences are made from data that have been aggregated, or summarized. This includes making inferences about the characteristics of the parts of a whole based on the characteristics of the whole itself.
- Aggregation
- Aggregation refers to the process by which data have been collected and summarized in some way or sorted into categories.
- Applied statistics
- Applied statistics is a type of statistics that involves the application of statistical methods to disciplines and areas of study.
- Arithmetic mean
- An arithmetic mean, often simply called a mean, is a type of average, or measure of central tendency, in which the middle of a dataset is determined by adding its numeric values and dividing by the number of values.
- Attitudinal statement
- An attitudinal statement is a type of response given to a scaled question on a survey that asks an individual to rate his or her feelings toward a particular topic.
- Axis label
- An axis label is used on a graph to denote the kind of unit or rate of measurement used as the dependent or independent variable (or variables), and can be found along an axis of a graph.
- Back transformation
- Back transformation is the process by which mathematical operations are applied to data in a dataset that have already been transformed, in order to back transform, or revert, the data to their original form.
- Back-end check
- A back-end check, also called a server-side check is a type of data validation for datasets gathered electronically, and is performed at the back end, or after data are stored in an electronic database.
- Bar graph
- A bar graph or chart uses horizontal or vertical bars whose lengths proportionally represent values in a dataset. A chart with vertical bars is also called a column graph or chart.
- Cartogram
- A cartogram is a map that overlays categorical data onto a projection and uses a different color to represent each category. Unlike a heat map, a cartogram does not necessarily use color saturation to depict the frequency of values in a category.
- Categorical data
- Categorical data are data, quantitative or qualitative, that can be sorted into distinct categories.
- Category label
- A category label is used on a graph to denote the name of a category, or a group, of data and may be descriptive or a range of numeric values.
- Chart title
- A chart title is the description assigned to the graph and includes a summary of the message aimed at the target audience and may include information about the dataset.
- Checkbox response
- A checkbox response refers to an answer given to a question in a survey administered in electronic form, for which one or more responses can be selected at a time, as may be indicated by a checkbox an individual clicks on.
- Chroma
- Chroma is the saturation, or vividness, of a hue.
- Closed question
- A closed, or closed-ended, question is a type of question featured in a poll or survey that requires a limited, or specific kind of, response and is used to collect quantitative data or data that can be analyzed quantitatively later on.
- Codebook
- A codebook documents the descriptions, terms, variables, and values that are represented by abbreviated or coded words or symbols used in a dataset, and serves as a means for coding and decoding the information.
- Color theory
- Color theory refers to principles of design focuses on colors and the relationships between them.
- Continuous variable
- A continuous variable, or continuous scale, has an unlimited number of possible values between the highest and lowest values in a dataset.
- Correlation
- Correlation measures the degree of association, or the strength of the relationship, between two variables using mathematical operations.
- CRAAP test
- The CRAAP test denotes a set of questions a researcher may use to assess the quality of source information across five criteria: currency, relevance, authority, accuracy, and purpose.
- Data cleaning
- Data cleaning, also called data checking or data validation, is the process by which missing, erroneous, or invalid data are determined and cleaned, or removed, from a dataset and follows the data preparation process.
- Data label
- A data label is used on a graph to denote the value of a plotted point.
- Data preparation
- Data preparation is the process by which data are readied for analysis and includes the formatting, or normalizing, of values in a dataset.
- Data transformation
- Data transformation is the process by which data in a dataset are transformed, or changed, during data cleaning and involves the use of mathematical operations in order to reveal features of the data that are not observable in their original form.
- Data visualization
- Data visualization, or data presentation, is the process by which data are visualized, or presented, after the data cleaning process, and involves making choices about which data will be visualized, how data will be visualized, and what message will be shared with the target audience of the visualization. The end result may be referred to as a data visualization.
- Data
- Data are observations, facts, or numeric values that can be described or measured, interpreted or analyzed.
- Dependent variable
- A dependent variable is a type of variable whose value is determined by, or depends on, another variable.
- Descriptive statistics
- Descriptive statistics is a type of applied statistics that numerically summarizes or describes data that have already been collected and is limited to the dataset.
- Diary
- A diary is a data collection method in which data, qualitative or quantitative, are tracked over an extended period of time.
- Dichotomous question
- A dichotomous question is a type of closed question featured in a poll or survey that requires an individual to choose only one of two possible responses.
- Direct measurement
- Direct measurement is a type of measurement method that involves taking an exact measurement of a variable and recording that numeric value in a dataset.
- Discrete variable
- A discrete variable, or a discrete scale, has a limited number of possible values between the highest and lowest values in a dataset.
- External data
- External data refer to data that a researcher or organization use, but which have been collected by an outside researcher or organization.
- Factoid
- A factoid, or trivial fact, is a single piece of information that emphasizes a particular point of view, idea, or detail. A factoid does not allow for any further statistical analysis.
- Filter
- A filter is a programmed list of conditions that filters, or checks, items that meet those conditions and may specify further instructions either for the filtered items.
- Focus group
- A focus group is a data collection method used for qualitative research in which a group of selected individuals participate in a guided discussion.
- Forced question
- A forced question is a type of scaled question featured in a survey that requires an individual to choose from a give range of possible responses, none of which is neutral.
- Front-end check
- A front-end check, also called a client-side check, is a type of data validation for datasets gathered electronically, and is performed at the front end, or before data are stored in an electronic database.
- Graphical user interface (GUI)
- A graphical user interface, or GUI, is a type of interface that allows a user to interact with a computer through graphics, such as icons and menus, in place of lines of text.
- Heat map
- A heat map is a graph that uses colors to represent categorical data in which the saturation of the color reflects the category’s frequency in the dataset.
- Histogram
- A histogram is a graph that uses bars to represent proportionally a continuous variable according to how frequently the values occur within a dataset.
- Hue
- A hue, as defined in color theory, is a color without any black or white pigments added to it.
- Independent variable
- An independent variable is a type of variable that can be changed, or manipulated, and determines the value of at least one other variable.
- Inferential statistics
- Inferential statistics is a type of applied statistics that makes inferences, or predictions, beyond the dataset.
- Infographic
- An infographic is a graphical representation of data that may combine several different types of graphs and icons in order to convey a specific message to a target audience.
- Interactive graphic
- An interactive graphic is a type of visualization designed for digital or print media that presents information that allows, and may require, input from the viewer.
- Interviewer effect
- Interviewer effect refers to any effect an interviewer can have on subjects such that he or she influences the responses to the questions.
- Invalid data
- An invalid data are values in a dataset that fall outside the range of valid, or acceptable, values during data cleaning.
- Leading question
- A leading question is a type of question featured in a poll or survey that prompts, or leads, an individual to choose a particular response and produces a skewed, or biased, dataset.
- Legend
- A legend is used on a graph in order to denote the meaning of colors, abbreviations, or symbols used to represent data in dataset.
- Legibility
- Legibility is a term used in typography and refers to the ease with which individual characters in a text can be distinguished from one another when read.
- Line graph
- A line graph uses plotted points that are connected by a line to represent values of a dataset with one or more dependent variables and one independent variable.
- Median
- A median is a type of average, or measure of central tendency, in which the middle of a dataset is determined by arranging its numeric values in order.
- Metadata
- Metadata are data about other data, and may be used to clarify or give more information about some part or parts of another dataset.
- Missing data
- Missing data are values in a dataset that have not been stored sufficiently, whether blank or partial, and may be marked by the individual working with the dataset.
- Mode
- A mode is a numeric value that appears most often in a dataset.
- Motion graphic
- A motion graphic is a type of visualization designed for digital media that presents moving information without need for input from the viewer.
- Multiseries
- A multiseries is a dataset that compares multiple series,or two or more dependent variables and one independent variable.
- Normal distribution
- A normal distribution, often called a bell curve, is a type of data distribution in which the values in a dataset are distributed symmetrically around the mean value. Normally distributed data take the shape of a bell when represented on a graph,the height of which is determined by the mean of the sample, and the width of which is determined by the standard deviation of the sample.
- Open content
- Open content, open access, open source, and open data are closely-related terms that refer to digital works that are free of most copyright restrictions. Generally, the original creator has licensed a work for use by others at no cost so long as some conditions, such as author attribution, are met (See: Suber, Peter. Open Access, Cambridge, Massachusetts: MIT Press, 2012). Conditions vary from license to license and determine how open the content is.
- Open question
- An open, or open-ended question, is a type of question featured in a survey that does not require a specific kind of response and is used to collect qualitative data.
- Order bias
- Order bias occurs when the sequencing of questions featured in a survey has an effect on the responses an individual chooses, and produces a biased, or skewed, dataset.
- Outlier
- An outlier is an extremely high or extremely low numeric value that lies outside the distribution of most of the values in a dataset.
- Pattern matching
- Pattern matching is the process by which a sequence of characters is checked against a pattern in order to determine whether the characters are a match.
- Peer-to-peer (P2P) network
- A peer-to-peer network, often abbreviated P2P, is a network of computers that allows for peer-to-peer sharing, or shared access to files stored on the computers in the network rather than on a central server.
- Pie chart
- A pie chart is a circular graph divided into sectors, each with an area relative to whole circle, and is used to represent the frequency of values in a dataset.
- Population
- A population is the complete set from which a sample is drawn.
- Probability
- Probability is the measure of how likely, or probable, it is that an event will occur.
- Qualitative data
- Qualitative data are a type of data that describe the qualities or attributes of something using words or other non-numeric symbols.
- Quantitative data
- Quantitative data are a type of data that quantify or measure something using numeric values.
- Radio response
- A radio response refers to an answer given to a question in a poll or survey administered in electronic form, for which only one response can be selected at a time, as may be indicated by a round radio button an individual clicks on.
- Range check
- A range check is a type of check used in data cleaning that determines whether any values in a dataset fall outside a particular range.
- Range
- A range is determined by taking the difference between the highest and lowest numeric values in a dataset.
- Raw data
- Raw data refer to data that have only been collected, not manipulated or analyzed, from a source.
- Readability
- Readability is a term used in typography and refers to the ease with which a sequence of characters in a text can be read. Factors affecting readability include the placement of text on a page and the spacing between characters, words, and lines of text.
- Sample
- A sample is a set of collected data.
- Sampling bias
- Sampling bias occurs when some members of a population are more or less likely than other members to be represented in a sample of that population.
- Scaled question
- A scaled question is a type of question featured in a survey that requires an individual to choose from a given range of possible responses.
- Scatterplot
- A scatterplot uses plotted points (that are not connected by a line) to represent values of a dataset with one or more dependent variables and one independent variable.
- Series graph
- A series graph proportionally represents values of a dataset with two or more dependent variables and one independent variable.
- Series
- A series is a dataset that compares one or more dependent variables with one independent variable.
- Shade
- Shade refers to adding black to a hue in order to darken it.
- Skewed data
- Skewed data are data with a non-normal distribution and tend to have more values to the left, as in left-skewed, or right, as in right-skewed, of the mean value when represented on a graph.
- Stacked bar graph
- A stacked bar graph is a type of bar graph whose bars are divided into sub-sections, each of which proportionally represent categories of data in a dataset that can be stacked together to form a larger category.
- Standard deviation
- A standard deviation is a measure of how much the values in a dataset vary, or deviate, from the arithmetic mean by taking the square root of the variance.
- Static graphic
- A static graphic is a type of visualization designed for digital or print media that presents information without need for input from the viewer.
- Statistics
- Statistics is the study of collecting, measuring, and analyzing quantitative data using mathematical operations.
- Summable multiseries
- A summable multiseries is a type of multiseries with two or more dependent variables that can be added together and compared with an independent variable.
- Summary record
- A summary record is a record in a database that has been sorted,or aggregated, in some way after having been collected.
- Tint
- Tint refers to adding white to a hue in order to lighten it.
- Transactional record
- A transactional record is a record in a database that has not yet been sorted, or aggregated, after collection.
- Value (color)
- Value, or brightness, refers to the tint, shade, or tone of a hue that results black or white pigments to a base color.
- Variance
- Variance, or statistical variance, is a measure of how spread out the numeric values in a dataset are, or how much the values vary, from the arithmetic mean.