Reputation: 71
I have a large data frame that is grouped (column group in example data) into measurements that have taken place at the same time. Values (column value) were measured by 4 different devices (column device), but there was not necessarily a measurement of all 4 devices at the same time. My data looks like this
group device value value.to.be.averaged
1 1 a 69 173
2 1 b 75 172
3 2 a 78 173
4 2 c 69 172
5 2 b 60 173
6 2 d 62 172
7 3 a 80 173
8 3 c 63 173
9 3 b 63 172
which can be reproduced by this
my_data<-data.frame(group=c(rep(1:3, c(2,4,3))),
device=c("a", "b", "a", "c", "b", "d", "a", "c", "b") ,
value=sample(60:80,9,replace=T), value.to.be.averaged=sample(172:173,9,replace=T))
I need to bring device into column, keep the group number as row id and assign each exact measured value to the devices and if a device is missing in the group assign NA. There are other variables that can be simply averaged. The data should look like this afterwards.
group a b c d value.to.be.averaged
1 1 77 68 NA NA 172
2 2 60 73 70 76 173
3 3 65 76 77 NA 172
Which can be reproduced by this
my_reshaped_data<-data.frame(group=c(1:3), a=c(77, 60, 65), b=c(68,73,76), c=c(NA, 70,77), d=c(NA, 76, NA), value.to.be.averaged=c(mean(172,173,173),
mean(rep(173,3)), mean(172,172,173)))
Since my data is hughe, computation time and computational ressource is an issue. I found some examples with evenly siced groups but non with irregular groups. Any suggestions? Cheers, jannis
Upvotes: 1
Views: 52
Reputation: 375
You can use the tidyverse to convert the data from long to wide format. It automatically incorporates NA into the column when the device was not used for that group
my_data %>%
group_by(group) %>%
mutate(average.values = mean(value.to.be.averaged)) %>%
select(-value.to.be.averaged) %>%
ungroup() %>%
pivot_wider(., names_from = device, values_from = value)
Upvotes: 1
Reputation: 2678
Here is a solution using dcast
from data.table
:
library(data.table)
setDT(df)[, value.to.be.averaged := round(mean(value.to.be.averaged)), by = group]
dcast(df, group + value.to.be.averaged ~ device)
# group value.to.be.averaged a b c d
#1: 1 172 69 75 NA NA
#2: 2 172 78 60 69 62
#3: 3 173 80 63 63 NA
df <- structure(list(group = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), device = structure(c(1L, 2L, 1L, 3L, 2L, 4L, 1L, 3L, 2L), .Label = c("a","b", "c", "d"), class = "factor"), value = c(69L, 75L, 78L,69L, 60L, 62L, 80L, 63L, 63L), value.to.be.averaged = c(173L, 172L, 173L, 172L, 173L, 172L, 173L, 173L, 172L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))
Upvotes: 1