Reputation: 13
I have a data frame like this
subject_id area side value confound1 confound2 confound3
s01 A left 5 154 952 no
s01 A right 7 154 952 no
s01 B left 15 154 952 no
s01 B right 17 154 952 no
s02 A left 3 130 870 yes
s02 A right 5 130 870 yes
s02 B left 12 130 870 yes
s02 B right 11 130 870 yes
I would like to add rows of averages between left and right for each area for each subject while keeping the values of the other variables:
subject_id area side value confound1 confound2 confound3
s01 A left 5 154 952 no
s01 A right 7 154 952 no
s01 A avg 6 154 952 no
s01 B left 15 154 952 no
s01 B right 17 154 952 no
s01 B avg 16 154 952 no
s02 A left 3 130 870 yes
s02 A right 5 130 870 yes
s02 A avg 4 130 870 yes
s02 B left 12 130 870 yes
s02 B right 11 130 870 yes
s02 B avg 11.5 130 870 yes
Any suggestions for how to do this?
Upvotes: 0
Views: 1845
Reputation: 886938
An option using data.table
library(data.table)
rbind(setDT(df)[, .(side = 'avg', value=mean(value)) ,
.(subject_id, area, confound1, confound2, confound3)][,
names(df), with=FALSE], df)[order(subject_id, area,
factor(side, levels=c('left', 'right', 'ave')))]
# subject_id area side value confound1 confound2 confound3
# 1: s01 A left 5.0 154 952 no
# 2: s01 A right 7.0 154 952 no
# 3: s01 A avg 6.0 154 952 no
# 4: s01 B left 15.0 154 952 no
# 5: s01 B right 17.0 154 952 no
# 6: s01 B avg 16.0 154 952 no
# 7: s02 A left 3.0 130 870 yes
# 8: s02 A right 5.0 130 870 yes
# 9: s02 A avg 4.0 130 870 yes
#10: s02 B left 12.0 130 870 yes
#11: s02 B right 11.0 130 870 yes
#12: s02 B avg 11.5 130 870 yes
Upvotes: 1
Reputation: 38500
Here is a method with base R functions aggregate
and rbind
.
# get the data
df <- read.table(header=T, text="subject_id area side value confound1 confound2 confound3
s01 A left 5 154 952 no
s01 A right 7 154 952 no
s01 B left 15 154 952 no
s01 B right 17 154 952 no
s02 A left 3 130 870 yes
s02 A right 5 130 870 yes
s02 B left 12 130 870 yes
s02 B right 11 130 870 yes")
# get the average values
dfAgg <- aggregate(cbind(value=value, confound1=confound1,
confound2=confound2, confound3=confound3) ~
subject_id + area, data=df, FUN=mean)
# add variables
dfAgg$side <- "side.avg"
dfAgg$confound3 <- factor(dfAgg$confound3, labels=c("no", "yes"))
#rbind the averages
dfFinal <- rbind(df, dfAgg)
# order the data
dfFinal <- dfFinal[order(dfFinal$subject_id, dfFinal$area, dfFinal$side),]
Upvotes: 3
Reputation: 10473
Using library dplyr
, you can do something like this:
library(dplyr)
df %>% group_by(subject_id, area) %>% mutate(mean_left_right = mean(value))
Output is:
Source: local data frame [8 x 8]
Groups: subject_id, area [4]
subject_id area side value confound1 confound2 confound3 mean_left_right
<chr> <chr> <chr> <int> <int> <int> <chr> <dbl>
1 s01 A left 5 154 952 no 6.0
2 s01 A right 7 154 952 no 6.0
3 s01 B left 15 154 952 no 16.0
4 s01 B right 17 154 952 no 16.0
5 s02 A left 3 130 870 yes 4.0
6 s02 A right 5 130 870 yes 4.0
7 s02 B left 12 130 870 yes 11.5
8 s02 B right 11 130 870 yes 11.5
Upvotes: 2
Reputation: 1981
I'd use tidyr
to gather and then spread your data.
library(dplyr)
library(tidyr)
df %>%
spread(side, value) %>%
mutate(avg = (left + right)/2) %>%
gather(side, value, left:avg)
subject_id area confound1 confound2 confound3 side value
1 s01 A 154 952 no left 5.0
2 s01 B 154 952 no left 15.0
3 s02 A 130 870 yes left 3.0
4 s02 B 130 870 yes left 12.0
5 s01 A 154 952 no right 7.0
6 s01 B 154 952 no right 17.0
7 s02 A 130 870 yes right 5.0
8 s02 B 130 870 yes right 11.0
9 s01 A 154 952 no avg 6.0
10 s01 B 154 952 no avg 16.0
11 s02 A 130 870 yes avg 4.0
12 s02 B 130 870 yes avg 11.5
Upvotes: 1