Maxxx
Maxxx

Reputation: 3768

Aggregating two columns in R dataframe

I have a dataframe in R called food:

foodID   calories   fat    protein

 123       0.5      0.4     0.9
 432       0.65     0.3     0.7
 123       0.32     0.6     0.5
 983       0.82     0.2     0.6

and I'm trying to average up the calories and protein column by foodID.

I tried:

cal_pro <- aggregate(food[2,4], list(food$foodID), function(df) mean(df))

But it appears that i can't select the columns to be applied the mean function by food[2,4]? Could anyone help me out on this.

Upvotes: 0

Views: 52

Answers (3)

divibisan
divibisan

Reputation: 12155

Using dplyr, you can just group_by and summarize:

food %>%
    group_by(foodID) %>%
    summarize(calories_average = mean(calories),
              protein_average = mean(protein))

# A tibble: 3 x 3
  foodID calories_average protein_average
   <int>            <dbl>           <dbl>
1    123             0.41             0.7
2    432             0.65             0.7
3    983             0.82             0.6

Rather than specifying each variable, you can use summarize_at to select multiple variables to summarize at once. We pass in 2 arguments: the variables to summarize, and a list of functions to apply to them. If the list is named, as it is here, then the name is added to the summary column as a suffix (giving "calores_average" and "protein_average":

food %>%
    group_by(foodID) %>%
    summarize_at(c('calories', 'protein'), list(average = mean))

summarize_at also allows you to use various helper functions to select variables by prefix, suffix, or regex (as shown below). You can learn more about them here: ?tidyselect::select_helpers

food %>%
    group_by(foodID) %>%
    summarize_at(vars(matches('calories|protein')), list(average = mean))

Upvotes: 1

Rushabh Patel
Rushabh Patel

Reputation: 2764

You can use data.table package-

> setDT(dt)[,list(avg_calorie=mean(calories),avg_protein=mean(protein)),by=foodID]

Output-

    foodID avg_calorie avg_protein
1:    123        0.41         0.7
2:    432        0.65         0.7
3:    983        0.82         0.6

Upvotes: 0

akrun
akrun

Reputation: 886938

We can use the formula method

aggregate(cbind(calories, protein) ~ foodID, food, mean)

Or using the OP's code, it should be c(2, 4), because if we do 2, 4, it is selecting the 2nd row of 4th column by row/column indexing

aggregate(food[c(2, 4)], list(food$foodID), mean)

EDIT: Based on @RuiBarradas comments

Upvotes: 0

Related Questions