Reputation: 41
very new to R coding and I've been trying to format/merge rows in a df using group_by from dplyr. However, I'm not quite there yet.
This is a simplification of my data table. The first three entries share the same id, and the last two entries share the same id.
ID Assay1 Assay2 Assay3 Assay4 Assay5
13,12 Months,<=-65C 12 NA NA NA NA
13,12 Months,<=-65C NA 11 NA NA NA
13,12 Months,<=-65C NA NA 33 NA NA
09,06 Months,<=-65C 112 NA NA NA NA
09,06 Months,<=-65C NA 115 NA NA NA
I want to achieve the following:
ID Assay1 Assay2 Assay3 Assay4 Assay5
13,12 Months,<=-65C 12 11 33 NA NA
09,06 Months,<=-65C 112 115 NA NA NA
So that the values in AssayValueX are merged into one row per unique id. I've used group_by with summarise but I dont want a summary... I want a new df with the merged rows! If other functions are more appropriate to achieve this, please let me know.
Upvotes: 2
Views: 925
Reputation: 323266
You can using dplyr
and zoo
df %>%
group_by(ID) %>%
mutate_each(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())
ID Assay1 Assay2 Assay3 Assay4 Assay5
13,12 Months,<=-65C 12 11 33 NA NA
09,06 Months,<=-65C 112 115 NA NA NA
EDIT
For you addtional question
Data:
ID Assay1 Assay2 Assay3 Assay4 Assay5
1 13,12 Months,<=-65C 12 13 NA NA NA
2 13,12 Months,<=-65C 11 11 999 NA NA
3 13,12 Months,<=-65C NA NA 33 NA NA
4 09,06 Months,<=-65C 112 NA NA NA NA
5 09,06 Months,<=-65C NA 115 NA NA NA
Solution
df=df %>%
group_by(ID) %>%
summarise_all(funs(toString(na.omit(.))))
df[df=='']=NA
> df
ID Assay1 Assay2 Assay3 Assay4 Assay5
<chr> <chr> <chr> <chr> <chr> <chr>
1 09,06 Months,<=-65C 112 115 <NA> <NA> <NA>
2 13,12 Months,<=-65C 12, 11 13, 11 999, 33 <NA> <NA>
Upvotes: 3
Reputation: 16277
Here's a tidyr
solution. With your example data, there is no Assay4 and Assay5 data because they are NA
s, but it should work with your real life data. Basically, the rationale is to gather
the data, remove the NAs and spread
it again.
library(tidyr)
df%>%
gather(Assay, value,-ID)%>%
filter(!is.na(value))%>%
spread(Assay,value)
ID Assay1 Assay2 Assay3
1 09,06 Months,<=-65C 112 115 NA
2 13,12 Months,<=-65C 12 11 33
data
df <- read.table(text="ID Assay1 Assay2 Assay3 Assay4 Assay5
'13,12 Months,<=-65C' 12 NA NA NA NA
'13,12 Months,<=-65C' NA 11 NA NA NA
'13,12 Months,<=-65C' NA NA 33 NA NA
'09,06 Months,<=-65C' 112 NA NA NA NA
'09,06 Months,<=-65C' NA 115 NA NA NA",
header=TRUE,stringsAsFactors=FALSE)
Upvotes: 2
Reputation: 51592
Another possible solution via dplyr
,
library(dplyr)
df %>%
group_by(ID) %>%
summarise_all(funs(sum(., na.rm = TRUE))) %>%
mutate_all(funs(replace(., . == 0, NA)))
# A tibble: 2 x 6
# ID Assay1 Assay2 Assay3 Assay4 Assay5
# <fctr> <int> <int> <int> <int> <int>
#1 09,06_Months,<=-65C 112 115 NA NA NA
#2 13,12_Months,<=-65C 12 11 33 NA NA
Upvotes: 1