Reputation: 13

R: dataframe - adding rows with means

I have a data frame like this

subject_id area side value confound1 confound2 confound3
s01 A left 5 154 952 no
s01 A right 7 154 952 no
s01 B left 15 154 952 no
s01 B right 17 154 952 no
s02 A left 3 130 870 yes
s02 A right 5 130 870 yes
s02 B left 12 130 870 yes
s02 B right 11 130 870 yes

I would like to add rows of averages between left and right for each area for each subject while keeping the values of the other variables:

subject_id area side value confound1 confound2 confound3
s01 A left 5 154 952 no
s01 A right 7 154 952 no
s01 A avg 6 154 952 no
s01 B left 15 154 952 no
s01 B right 17 154 952 no
s01 B avg 16 154 952 no
s02 A left 3 130 870 yes
s02 A right 5 130 870 yes
s02 A avg 4 130 870 yes
s02 B left 12 130 870 yes
s02 B right 11 130 870 yes
s02 B avg 11.5 130 870 yes

Any suggestions for how to do this?

Upvotes: 0

Answers (4)

akrun

Reputation: 886938

An option using data.table

library(data.table)
rbind(setDT(df)[, .(side = 'avg', value=mean(value)) , 
   .(subject_id, area, confound1, confound2, confound3)][,
   names(df), with=FALSE], df)[order(subject_id, area,
      factor(side, levels=c('left', 'right', 'ave')))]
#    subject_id area  side value confound1 confound2 confound3
# 1:        s01    A  left   5.0       154       952        no
# 2:        s01    A right   7.0       154       952        no
# 3:        s01    A   avg   6.0       154       952        no
# 4:        s01    B  left  15.0       154       952        no
# 5:        s01    B right  17.0       154       952        no
# 6:        s01    B   avg  16.0       154       952        no
# 7:        s02    A  left   3.0       130       870       yes
# 8:        s02    A right   5.0       130       870       yes
# 9:        s02    A   avg   4.0       130       870       yes
#10:        s02    B  left  12.0       130       870       yes
#11:        s02    B right  11.0       130       870       yes
#12:        s02    B   avg  11.5       130       870       yes

Upvotes: 1

lmo

Reputation: 38500

Here is a method with base R functions aggregate and rbind.

# get the data
 df <- read.table(header=T, text="subject_id area side value confound1 confound2 confound3
 s01 A left 5 154 952 no
                  s01 A right 7 154 952 no
                  s01 B left 15 154 952 no
                  s01 B right 17 154 952 no
                  s02 A left 3 130 870 yes
                  s02 A right 5 130 870 yes
                  s02 B left 12 130 870 yes
                  s02 B right 11 130 870 yes")

# get the average values
dfAgg <- aggregate(cbind(value=value, confound1=confound1, 
                         confound2=confound2, confound3=confound3) ~ 
                     subject_id + area, data=df, FUN=mean)
# add variables
dfAgg$side <- "side.avg"
dfAgg$confound3 <- factor(dfAgg$confound3, labels=c("no", "yes"))

#rbind the averages    
dfFinal <- rbind(df, dfAgg)

# order the data
dfFinal <- dfFinal[order(dfFinal$subject_id, dfFinal$area, dfFinal$side),]

Upvotes: 3

Gopala

Reputation: 10473

Using library dplyr, you can do something like this:

library(dplyr)
df %>% group_by(subject_id, area) %>% mutate(mean_left_right = mean(value))

Output is:

Source: local data frame [8 x 8]
Groups: subject_id, area [4]

  subject_id  area  side value confound1 confound2 confound3 mean_left_right
       <chr> <chr> <chr> <int>     <int>     <int>     <chr>           <dbl>
1        s01     A  left     5       154       952        no             6.0
2        s01     A right     7       154       952        no             6.0
3        s01     B  left    15       154       952        no            16.0
4        s01     B right    17       154       952        no            16.0
5        s02     A  left     3       130       870       yes             4.0
6        s02     A right     5       130       870       yes             4.0
7        s02     B  left    12       130       870       yes            11.5
8        s02     B right    11       130       870       yes            11.5

Upvotes: 2

Mhairi McNeill

Reputation: 1981

I'd use tidyr to gather and then spread your data.

library(dplyr)
library(tidyr)

df %>%
  spread(side, value) %>%
  mutate(avg = (left + right)/2) %>%
  gather(side, value, left:avg)

       subject_id area confound1 confound2 confound3  side value
1         s01    A       154       952        no  left   5.0
2         s01    B       154       952        no  left  15.0
3         s02    A       130       870       yes  left   3.0
4         s02    B       130       870       yes  left  12.0
5         s01    A       154       952        no right   7.0
6         s01    B       154       952        no right  17.0
7         s02    A       130       870       yes right   5.0
8         s02    B       130       870       yes right  11.0
9         s01    A       154       952        no   avg   6.0
10        s01    B       154       952        no   avg  16.0
11        s02    A       130       870       yes   avg   4.0
12        s02    B       130       870       yes   avg  11.5

Upvotes: 1

R: dataframe - adding rows with means

Answers (4)

Related Questions