Creating a Good Survey #15: Easy Survey Data Analysis: From Descriptive Statistics to Cross-Tabulation

Once numerous response data have been collected through a survey, it's time to embark on the journey of extracting meaningful information from this pile of numbers. Many beginners feel burdened by complex statistical analysis, but in reality, even basic analysis methods can derive sufficiently valuable insights and be utilized for business decision-making. This article will explore descriptive statistics analysis and cross-tabulation analysis, key survey data analysis methods that beginners without specialized statistical knowledge can easily approach.

 

1. The First Step in Analysis: Data Cleaning & Preparation

Before full-scale analysis, the process of refining the collected raw data is essential. This stage involves the following tasks:

  • Identifying and Handling Insincere Responses: Responses where all items are answered identically, response time is abnormally short, or answers are logically contradictory are considered insincere and are excluded from analysis or handled separately.
  • Correcting Erroneous Data: Check and correct typos or input errors.
  • Data Coding: For open-ended responses, it may be necessary to group similar content to create categories and convert them into numbers or symbols. Multiple-choice responses are also often converted to numerical codes for ease of analysis.

Clean, refined data is the basic condition for enhancing the reliability of analysis results.

 

2. Grasping the Overall Picture of Data: Descriptive Analysis

Descriptive analysis is the most fundamental analysis method that summarizes and explains the basic characteristics and distribution of collected data. Through this, one can understand the overall trends of the data and grasp key features at a glance.

  • Frequency Analysis: This method identifies how responses are distributed across the answer items (categories) of each question.
    • Example: For the "OOO Product Satisfaction" question, count the number of respondents for each response category, such as 'Very Satisfied' 30 people, 'Satisfied' 40 people, 'Neutral' 20 people, 'Dissatisfied' 7 people, 'Very Dissatisfied' 3 people.
  • Percentages: Calculate the proportion (%) each response category occupies relative to the total number of respondents to understand its relative importance or weight.
    • Example: In the satisfaction example above, 'Very Satisfied' is 30% (30 people / total 100 people), 'Satisfied' is 40% (40 people / total 100 people), etc.
  • Measures of Central Tendency: For data collected in numerical form (e.g., Likert scale scores, age, purchase amount), use indicators that show around what value the data is distributed.
    • Mean: The value obtained by adding all data values and dividing by the number of data points. (e.g., The average customer satisfaction score is 3.8 out of 5 points).
    • Median: The value located in the very center when data is arranged in order of size. It is less affected by extreme values.
    • Mode: The value that appears most frequently in the data.
  • Usage Examples:
    • "60% of our product users are women in their 20-30s." (Frequency and Percentage)
    • "The average preference score for new ad creative A was 4.2 points (out of 5), which was higher than creative B (3.5 points)." (Mean)

 

3. Discovering Hidden Relationships: Cross-Tabulation Analysis (Crosstab Analysis)

Cross-tabulation analysis is a very useful method for analyzing the relationship between two or more categorical variables (e.g., gender, age group, purchase status, satisfaction level). It can identify how a specific respondent group (e.g., women in their 20s) differs from other groups (e.g., men in their 40s) in response to a particular question, or what kind of association exists between two variables. This provides important clues for establishing target marketing strategies.

  • Concept: Analyze by creating a contingency table (or Crosstab) that shows how the categories of another variable are distributed according to each category of one variable.
  • Purpose: Used to compare response differences between specific groups, identify inter-variable correlations, and verify specific hypotheses.
  • How to Perform Cross-Tabulation Analysis Using Excel PivotTable (Simple Example):
    1. Prepare Data: Organize the survey data to be analyzed in an Excel sheet. Configure each row to represent an individual respondent and each column to represent a question item (variable).
    2. Insert PivotTable: Select the data range, then click [Insert] > from the Excel menu.
    3. Arrange Fields: From the PivotTable Fields list, drag and drop the variables you want to analyze into the 'Row Labels', 'Column Labels', and 'Values' areas.
      • Example: Drag the 'Age Group' variable to 'Row Labels', the 'Product Satisfaction' variable to 'Column Labels', and 'Respondent ID' (or any variable) to the 'Values' area, then change the Value Field Settings to 'Count'.
    4. Interpret Results: Through the generated crosstab, compare and analyze the product satisfaction response distribution for each age group (e.g., 20s have a high 'Very Satisfied' ratio, while 50s have a high 'Neutral' ratio). If necessary, change the value display format to '% of Row Total', '% of Column Total', etc., to compare based on ratios.
  • Usage Examples:
    • "The 20s customer group values 'latest features,' while the 50s and older customer group considers 'ease of use' more important." (Difference in importance factors by age group)
    • "The repurchase rate of customers who participated in promotion A was 60%, while that of customers who did not participate was 30%, suggesting that promotion A may have had a positive impact on repurchases." (Relationship between promotion participation and repurchase rate)

(Reference) Chi-Square Test: A statistical test method to determine if the association observed between two variables through cross-tabulation is statistically significant, i.e., not a result of chance.56 While it may be difficult for beginners to perform directly, knowing about such verification methods can aid in interpreting analysis results.

 

Easy Analysis Tips for beginners:

  • Don't Forget Your Goals: Analysis is not for its own sake but to find answers to survey goals and aid decision-making. Always keep survey goals in mind when setting the analysis direction.
  • Formulate and Test Hypotheses: Approach analysis by formulating a hypothesis like "OOO will happen" and verifying whether that hypothesis is correct or incorrect through data.
  • Start Small: Rather than trying to analyze all variables at once from the beginning, start by examining the relationships between the variables you consider most important.
  • Utilize Visualization: As will be detailed in the next article, visualizing analysis results with charts or graphs makes it easier to discover and understand patterns.

 

Descriptive statistics and cross-tabulation analysis are powerful and practical methods that allow beginners to extract meaningful information from survey data using familiar tools like Excel, even without complex statistical programs. Utilize these basic analysis techniques to discover the customer's voice hidden in the data and use it to establish better marketing strategies.