Jason Heppler
Jason Heppler

Reputation: 716

Adding columns for min, mean, and max to a data frame

I'm dipping my toes into R and am looking to do something that seems fairly straightforward, but I'm a little lost as to how to proceed. I'm reading a CSV that looks something like this:

translation,category,macrocategory,subcategory,1640,1641,1642,1643,1644
almonds,nuts,Flavoring/Other,,,491,,,
apples,Fruit,Fruits and Vegetables,42,,67,,,
Atlantic herring,Fish,Meat,,52,0,9,,
aurochs,Meat,Meat,game,,4,25,5,
bacon,Meat,Meat,pork,,275.87,78,92,0
barley groats,Grain,Grain,5,9,2,14,56,9
beef,Meat,Meat,Beef,,5.25,,,

What I'd like to have is to add new columns for the maximum, minimum, and mean for each food item (so, max, min, and mean for each row). I've handled some of the basic cleanup, but I'm a bit lost as to how to proceed from there.

library(dplyr)
library(tidyr)

df <- read.csv("foods.csv", sep=",", header = T)
food.clean <- data.frame(foodname=df[,1], data.matrix(df[,5:53]))
str(food.clean) # check

food.clean <- food.clean[rowSums(is.na(food.clean)) < 48, ]
summary(food.clean)

I also tried using the matrixStats package, but ran into an error:

library(matrixStats)
food.matrix <- as.matrix(food.clean)
cbind(food.clean, mean=rowMeans(food.matrix), sd=rowSds(food.matrix), max=rowMaxs(food.matrix))

Error in rowMeans(food.matrix) : 'x' must be numeric

Any pointers? Thanks!

Upvotes: 0

Views: 2313

Answers (2)

Taekyun Kim
Taekyun Kim

Reputation: 146

I guess that you would like to get the mean, min, and max for all numeric variables for each row.

If you have lots of numerical variables, it may be easier to tidy the data first as described in http://www.jstatsoft.org/v59/i10/

For instance, you can do the following.

library(dplyr)
library(tidyr)

df <- read.csv(text="
translation,category,macrocategory,subcategory,1640,1641,1642,1643,1644
almonds,nuts,Flavoring/Other,,,491,,,
apples,Fruit,Fruits and Vegetables,42,,67,,,
Atlantic herring,Fish,Meat,,52,0,9,,
aurochs,Meat,Meat,game,,4,25,5,
bacon,Meat,Meat,pork,,275.87,78,92,0
barley groats,Grain,Grain,5,9,2,14,56,9
beef,Meat,Meat,Beef,,5.25,,,"
)

info <- 
    df %>%
    # tidy data
    gather(variable, value, -(1:4)) %>%
    # summarise by food item 'translation'
    group_by(translation) %>%
    summarise(
        mean = mean(value, na.rm=TRUE),
        min = min(value, na.rm=TRUE),
        max = max(value, na.rm=TRUE)
    )

This will give you

       translation      mean    min    max
1          almonds 491.00000 491.00 491.00
2           apples  67.00000  67.00  67.00
3 Atlantic herring  20.33333   0.00  52.00
4          aurochs  11.33333   4.00  25.00
5            bacon 111.46750   0.00 275.87
6    barley groats  18.00000   2.00  56.00
7             beef   5.25000   5.25   5.25    

If you still need the original numerical variables, you can join this info with df.

df %>% left_join(info)

For more details, I would highly recommend reading up on tidy data paper and study dplyr and tidyr package.

Upvotes: 1

JasonAizkalns
JasonAizkalns

Reputation: 20463

It's not entirely clear what your sample data looks like (specifically, the 1640,1641,1642,1643,1644 in your header row), but this should show you one way of doing this with the mtcars sample dataset. If you could dput your actual data, it would be easier to help. Specifically, you could use dplyr::rowwise:

library(dplyr)
df <- tbl_df(mtcars)

head(df)
# Source: local data frame [6 x 11]
# 
#    mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


df %>%
  rowwise() %>%
  summarise(min = min(am, gear, carb),
            max = max(am, gear, carb),
            mean = mean(am, gear, carb))

#    min max mean
# 1    1   4    1
# 2    1   4    1
# 3    1   4    1
# 4    0   3    0
# 5    0   3    0
# 6    0   3    0
# 7    0   4    0
# 8    0   4    0
# 9    0   4    0
# 10   0   4    0
# .. ... ...  ...

Upvotes: 0

Related Questions