TvCasteren
TvCasteren

Reputation: 425

Combining data with Base R

I currently need to translate my dplyr code into base R code. My dplyr code gives me 3 columns, competitor sex, the olympic season and the number of different sports. The code looks like this:

olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n()) %>%
  setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

My data structure looks like this.

 structure(list(Name = c("A Lamusi", "Juhamatti Tapio Aaltonen", 
"Andreea Aanei", "Jamale (Djamel-) Aarrass (Ahrass-)", "Nstor Abad Sanjun", 
"Nstor Abad Sanjun"), Sex = c("M", "M", "F", "M", "M", "M"), 
    Age = c(23L, 28L, 22L, 30L, 23L, 23L), Height = c(170L, 184L, 
    170L, 187L, 167L, 167L), Weight = c(60, 85, 125, 76, 64, 
    64), Team = c("China", "Finland", "Romania", "France", "Spain", 
    "Spain"), NOC = c("CHN", "FIN", "ROU", "FRA", "ESP", "ESP"
    ), Games = c("2012 Summer", "2014 Winter", "2016 Summer", 
    "2012 Summer", "2016 Summer", "2016 Summer"), Year = c(2012L, 
    2014L, 2016L, 2012L, 2016L, 2016L), Season = c("Summer", 
    "Winter", "Summer", "Summer", "Summer", "Summer"), City = c("London", 
    "Sochi", "Rio de Janeiro", "London", "Rio de Janeiro", "Rio de Janeiro"
    ), Sport = c("Judo", "Ice Hockey", "Weightlifting", "Athletics", 
    "Gymnastics", "Gymnastics"), Event = c("Judo Men's Extra-Lightweight", 
    "Ice Hockey Men's Ice Hockey", "Weightlifting Women's Super-Heavyweight", 
    "Athletics Men's 1,500 metres", "Gymnastics Men's Individual All-Around", 
    "Gymnastics Men's Floor Exercise"), Medal = c(NA, "Bronze", 
    NA, NA, NA, NA), BMI = c(20.7612456747405, 25.1063327032136, 
    43.2525951557093, 21.7335354170837, 22.9481157445588, 22.9481157445588
    )), .Names = c("Name", "Sex", "Age", "Height", "Weight", 
"Team", "NOC", "Games", "Year", "Season", "City", "Sport", "Event", 
"Medal", "BMI"), row.names = c(NA, 6L), class = "data.frame")

Does anyone know how to translate this into base R?

Upvotes: 2

Views: 110

Answers (2)

akrun
akrun

Reputation: 887128

A base R option would be using aggregate twice

out <- aggregate(BMI ~ Sex + Season, 
     aggregate(BMI ~ Sex + Season + Sport, olympics, length), length)
names(out) <- c("Competitor_Sex", "Olympic_Season", "Num_Sports")
out
#   Competitor_Sex Olympic_Season Num_Sports
#1              F         Summer          1
#2              M         Summer          3
#3              M         Winter          1

It is similar to the OP's output

olympics %>% 
   group_by(Sex, Season, Sport) %>% 
   summarise(n()) %>% 
   group_by(Sex, Season) %>%
   summarise(n()) %>%
   setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))
# A tibble: 3 x 3
# Groups:   Sex [2]
#  Competitor_Sex Olympic_Season Num_Sports
#  <chr>          <chr>               <int>
#1 F              Summer                  1
#2 M              Summer                  3
#3 M              Winter                  1

Or it can be done in a compact way with table from base R

table(sub(",[^,]+$", "", names(table(do.call(paste, 
        c(olympics[c("Sex", "Season", "Sport")], sep=","))))))

 #  F,Summer M,Summer M,Winter 
 #      1        3        1 

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

Since you are grouping twice in dplyr you can use double aggregate in base R

setNames(aggregate(Name~Sex + Season, 
      aggregate(Name~Sex + Season + Sport, olympics, length), length), 
       c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

#   Competitor_Sex Olympic_Season Num_Sports
#1               F         Summer          1
#2               M         Summer          3
#3               M         Winter          1

This gives the same output as dplyr option

library(dplyr)
olympics %>% 
  group_by(Sex, Season, Sport) %>% 
  summarise(n()) %>% 
  group_by(Sex, Season) %>%
  summarise(n()) %>%
  setNames(c("Competitor_Sex", "Olympic_Season", "Num_Sports"))

#  Competitor_Sex Olympic_Season Num_Sports
#  <chr>          <chr>               <int>
#1 F              Summer                  1
#2 M              Summer                  3
#3 M              Winter                  1

Upvotes: 5

Related Questions