jgottwald
jgottwald

Reputation: 71

R- reshape irregular molten data frame by group that has missing values in another column

I have a large data frame that is grouped (column group in example data) into measurements that have taken place at the same time. Values (column value) were measured by 4 different devices (column device), but there was not necessarily a measurement of all 4 devices at the same time. My data looks like this

    group device value value.to.be.averaged
1     1      a    69                  173
2     1      b    75                  172
3     2      a    78                  173
4     2      c    69                  172
5     2      b    60                  173
6     2      d    62                  172
7     3      a    80                  173
8     3      c    63                  173
9     3      b    63                  172

which can be reproduced by this

my_data<-data.frame(group=c(rep(1:3, c(2,4,3))), 
           device=c("a", "b", "a", "c", "b", "d", "a", "c", "b") , 
           value=sample(60:80,9,replace=T), value.to.be.averaged=sample(172:173,9,replace=T))

I need to bring device into column, keep the group number as row id and assign each exact measured value to the devices and if a device is missing in the group assign NA. There are other variables that can be simply averaged. The data should look like this afterwards.

   group a  b c  d value.to.be.averaged
1     1 77 68 NA NA      172
2     2 60 73 70 76      173
3     3 65 76 77 NA      172

Which can be reproduced by this

my_reshaped_data<-data.frame(group=c(1:3), a=c(77, 60, 65), b=c(68,73,76), c=c(NA, 70,77), d=c(NA, 76, NA), value.to.be.averaged=c(mean(172,173,173), 
                                                                                                             mean(rep(173,3)), mean(172,172,173)))

Since my data is hughe, computation time and computational ressource is an issue. I found some examples with evenly siced groups but non with irregular groups. Any suggestions? Cheers, jannis

Upvotes: 1

Views: 52

Answers (2)

JCran
JCran

Reputation: 375

You can use the tidyverse to convert the data from long to wide format. It automatically incorporates NA into the column when the device was not used for that group

my_data %>%
  group_by(group) %>%
  mutate(average.values = mean(value.to.be.averaged)) %>%
  select(-value.to.be.averaged) %>%
  ungroup() %>%
  pivot_wider(., names_from = device, values_from = value)

Upvotes: 1

sm925
sm925

Reputation: 2678

Here is a solution using dcast from data.table:

 library(data.table)
 setDT(df)[, value.to.be.averaged := round(mean(value.to.be.averaged)), by = group]
 dcast(df, group + value.to.be.averaged ~ device)

 #     group value.to.be.averaged  a  b  c  d
 #1:     1                  172   69 75 NA NA
 #2:     2                  172   78 60 69 62
 #3:     3                  173   80 63 63 NA

data

df <- structure(list(group = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), device = structure(c(1L, 2L, 1L, 3L, 2L, 4L, 1L, 3L, 2L), .Label = c("a","b", "c", "d"), class = "factor"), value = c(69L, 75L, 78L,69L, 60L, 62L, 80L, 63L, 63L), value.to.be.averaged = c(173L, 172L, 173L, 172L, 173L, 172L, 173L, 173L, 172L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))

Upvotes: 1

Related Questions