andy
andy

Reputation: 2077

Create a Pivot Table of Two Categorical and Numerical Variables

I have the following hypothetical dataframe

Region <- c("District A", "District B","District A","District A","District B")
Gender <- c("Male","Male","Female", "Male","Female")
Age <- c(20, 21, 23, 34, 22)
AmountSold <- c(50, 10, 20, 4, 12)
RegionSales <- data.frame(Region, Gender, Age, AmountSold)

I'd like to create a pivot table or a table that shows both mean of Amount sold per gender and Region as well as the mean of Age per gender and region. How do I do that in R?

Upvotes: 2

Views: 295

Answers (3)

ktiu
ktiu

Reputation: 2636

This would be my approach with the dplyr package:

library(dplyr)

RegionSales %>%
  group_by(Region, Gender) %>%
  summarize(mean_age = mean(Age), mean_amount = mean(AmountSold))

Output:

# A tibble: 4 x 4
# Groups:   Region [2]
  Region     Gender mean_age mean_amount
  <chr>      <chr>     <dbl>       <dbl>
1 District A Female       23          20
2 District A Male         27          27
3 District B Female       22          12
4 District B Male         21          10

An option that ignores NA values:

RegionSales %>%
  group_by(Region, Gender) %>%
  summarize(mean_age = mean(Age, na.rm = T),
            mean_amount = mean(AmountSold, na.rm = T))

Upvotes: 3

ThomasIsCoding
ThomasIsCoding

Reputation: 102469

A base option using aggregate could help

> aggregate(. ~ Region + Gender, RegionSales, mean)
      Region Gender Age AmountSold
1 District A Female  23         20
2 District B Female  22         12
3 District A   Male  27         27
4 District B   Male  21         10

Upvotes: 2

akrun
akrun

Reputation: 887691

With dplyr, another option is to specify the variables in across

library(dplyr)
RegionSales %>%
    group_by(Region, Gender) %>%
    summarise(across(c(Age, AmountSold),
             ~ mean(., na.rm = TRUE), .names = "mean_{.col}"))

Upvotes: 3

Related Questions