Skip to Main Content

Statistics: Checklist: Statistics vs Data

Guide on Statistical Sources

Statistical sources and its terminologies

Is it a statistic or data/dataset that you need to support your research arguments or hypotheses?

Summary or interpretation of data

Examples: graphs, descriptive summary tables, charts, in-text mentions

Statistical sources: Statistical databases, web platforms, literature, documents and publications

Raw information that without any statistical analysis, may not make a lot of sense

Examples: qualitative data, quantitative data, microdata (data at individual level of observation)

Data sources: Data repositories, data publishers and journals, open data communities, Google data search

 

Common terminologies for statistical sources

Structured vs unstructured data Structured data is data that can be fit into data tables and can be quantifiably analysed. Unstructured data can appear in any form without a structured format e.g. text documents, audio files
Microdata Individual level of data/observation, frequently confidential data that may be anonymised, commonly found in surveys, government data, commercial or research data
Cross-sectional data A sample of data collection at a snapshot, can be a cross-section of longitudinal data
Longitudinal/panel data A series of data collection of panel data overtime, collected over the exact same sample each time, involving multiple variables
Percentage vs percentage point Example: when the success rate increases from 4% to 8%, the increase is described as 100% increase ([8-4]/4) and 4 percentage points increase (8 - 4).
Unit of analysis Individual, grouped, aggregated
Time series data Series of data collection of same variables over time

 

How to go around finding statistical sources

A thinking framework to find statistics or data

  1. What is your research question or topic?
    1. Define and identify the concepts that require statistical or data evidence
  2. What kind of statistics or data do you need?
    1. Descriptive analyses and summaries
      1. Statistics to mention in-text as supporting evidence
      2. Statistical diagrams to show trends or summaries
    2. Datasets to run further analysis on
      1. To support your hypothesis and come up with research findings
      2. Statistical diagrams and attachments of analyses to show methods of analysis
      3. Available software on hand to run these analyses - limits the types of file formats you can use

Statistic/data checklist

  1. Describe the statistic or data that you need
    1. Topic/concept
    2. Terms generally associated with the topic
    3. Related concepts or proxies related to the topic
    4. Who collects and produces these statistics?
    5. Where are they published?
    6. Type - e.g. growth rate, percentage, absolute numbers, visualisations, trends...
    7. Spread - time coverage, frequency, geographic range
  2. Sources to start with
    1. Find subject guides based on topic search - what are the subject(s) associated with your topical keywords? Use the recommended sources to find literature and search through related literature and references for relevant statistics
    2. Statistical databases and platforms
    3. Target web search

Where can I find data or learn more about data?

Finding data for your research needs is not too dissimilar to finding statistics as described above, but there are also other skills that are useful to obtain data insights after obtaining a suitable dataset.

Register for Data Insights workshop series here to find out more. The workshop series is based on the adapted framework shown below.

Adapted from sources: CRISP-DM data analytics life cycle (Joubert, 2020) and Data Literacy Competencies Matrix (Ridsdale et al., 2015)

For more information on finding datasets, visit the Research Data Management Library Guide, in particular, List of Data Repositories.

References

Joubert, S. (2020, August 7). Understanding the Lifecycle of a Data Analysis Project. Graduate Blog. https://graduate.northeastern.edu/resources/data-analysis-project-lifecycle/

Ridsdale, C., Rothwell, J., Smit, M., Bliemel, M., Irvine, D., Kelley, D., Matwin, S., Wuetherick, B., & Ali-Hassan, H. (2015). Strategies and Best Practices for Data Literacy Education Knowledge Synthesis Report. https://doi.org/10.13140/RG.2.1.1922.5044