sum/average different columns in dataframe R

Question

I have 4 columns in a data frame

a <- data.frame(a=c(1,2,3,4), b=c(4,5,6,7), c=c(7,6,5,4), d=c(8,4,3,2))

I want to average first two columns and last two columns to get one data frame with two columns of same nrows with average of first two columns and last two columns

expected output:

redmode · Accepted Answer

To reproduce your output (which is sum, not mean):

library(plyr)
ddply(a, .(), summarise, first=a+b, second=c+d)[,-1]

It produces:

  first second
1     5     15
2     7     10
3     9      8
4    11      6

To make data.frame with averages:

ddply(a, .(), summarise, first=(a+b)/2, second=(c+d)/2)[,-1]

Output is:

  first second
1   2.5    7.5
2   3.5    5.0
3   4.5    4.0
4   5.5    3.0

If you don't know columns' names code can be modified like this:

ddply(a, .(), summarise, first=a[,1]+a[,2], second=a[,3]+a[,4])[,-1]

Here you access columns by its order. Alternatively, you can simply run names(a) <- letters[1:4] prior to ddply().

ddply is very flexible function, you can specify grouping variables as second argument and get grouped results. But if the case is as simple as in the question you can call summarise directly:

summarise(a, first=a+b, second=c+d)                 # if you know columns' names
summarise(a, first=a[,1]+a[,2], second=a[,3]+a[,4]) # if you don't know columns' names

sum/average different columns in dataframe R

Answers (1)

Related Questions