pdench
pdench

Reputation: 41

Merge rows and fill empty positions with group_by and?

very new to R coding and I've been trying to format/merge rows in a df using group_by from dplyr. However, I'm not quite there yet.

This is a simplification of my data table. The first three entries share the same id, and the last two entries share the same id.

ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      NA      NA      NA      NA
13,12 Months,<=-65C  NA      11      NA      NA      NA
13,12 Months,<=-65C  NA      NA      33      NA      NA
09,06 Months,<=-65C  112     NA      NA      NA      NA
09,06 Months,<=-65C  NA      115     NA      NA      NA

I want to achieve the following:

ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      11      33      NA      NA
09,06 Months,<=-65C  112     115     NA      NA      NA

So that the values in AssayValueX are merged into one row per unique id. I've used group_by with summarise but I dont want a summary... I want a new df with the merged rows! If other functions are more appropriate to achieve this, please let me know.

Upvotes: 2

Views: 925

Answers (3)

BENY
BENY

Reputation: 323266

You can using dplyr and zoo

df %>%
    group_by(ID) %>%
    mutate_each(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


ID                   Assay1  Assay2  Assay3  Assay4  Assay5
13,12 Months,<=-65C  12      11      33      NA      NA
09,06 Months,<=-65C  112     115     NA      NA      NA

EDIT

For you addtional question

Data:

ID Assay1 Assay2 Assay3 Assay4 Assay5
1 13,12 Months,<=-65C     12     13     NA     NA     NA
2 13,12 Months,<=-65C     11     11    999     NA     NA
3 13,12 Months,<=-65C     NA     NA     33     NA     NA
4 09,06 Months,<=-65C    112     NA     NA     NA     NA
5 09,06 Months,<=-65C     NA    115     NA     NA     NA

Solution

df=df %>%
    group_by(ID) %>%
    summarise_all(funs(toString(na.omit(.))))
df[df=='']=NA
    > df
                   ID Assay1 Assay2  Assay3 Assay4 Assay5
                <chr>  <chr>  <chr>   <chr>  <chr>  <chr>
1 09,06 Months,<=-65C    112    115    <NA>   <NA>   <NA>
2 13,12 Months,<=-65C 12, 11 13, 11 999, 33   <NA>   <NA>

Upvotes: 3

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

Here's a tidyr solution. With your example data, there is no Assay4 and Assay5 data because they are NAs, but it should work with your real life data. Basically, the rationale is to gather the data, remove the NAs and spread it again.

library(tidyr)
df%>%
  gather(Assay, value,-ID)%>% 
  filter(!is.na(value))%>% 
  spread(Assay,value)

                   ID Assay1 Assay2 Assay3
1 09,06 Months,<=-65C    112    115     NA
2 13,12 Months,<=-65C     12     11     33

data

df <- read.table(text="ID                   Assay1  Assay2  Assay3  Assay4  Assay5
'13,12 Months,<=-65C'  12      NA      NA      NA      NA
'13,12 Months,<=-65C'  NA      11      NA      NA      NA
'13,12 Months,<=-65C'  NA      NA      33      NA      NA
'09,06 Months,<=-65C'  112     NA      NA      NA      NA
'09,06 Months,<=-65C'  NA      115     NA      NA      NA",
 header=TRUE,stringsAsFactors=FALSE)

Upvotes: 2

Sotos
Sotos

Reputation: 51592

Another possible solution via dplyr,

library(dplyr)

df %>% 
  group_by(ID) %>%  
  summarise_all(funs(sum(., na.rm = TRUE))) %>% 
  mutate_all(funs(replace(., . == 0, NA)))

# A tibble: 2 x 6
#                   ID Assay1 Assay2 Assay3 Assay4 Assay5
#               <fctr>  <int>  <int>  <int>  <int>  <int>
#1 09,06_Months,<=-65C    112    115     NA     NA     NA
#2 13,12_Months,<=-65C     12     11     33     NA     NA

Upvotes: 1

Related Questions