user14563529
user14563529

Reputation:

Merge columns within dataframe based on column value R

I currently have a data frame of this structure

    ID-No  cigsaday   activity  
    1      NA        1           
    2      NA        1          
    1       5       NA          
    2       5       NA          

I want to concatenate the rows with the identical ID numbers and create a new data frame that is supposed to look like this

ID-No  cigsaday   activity  
    1      5        1           
    2      5        1

The data frame includes characters as well as numerical, in this way we would match based on a participant ID which occurs 4 times in the dataset within the first column.

Any help is appreciated!

Upvotes: 0

Views: 411

Answers (3)

Uwe
Uwe

Reputation: 42544

Many ways lead to Rome. For the sake of completeness, here are some other approaches which return the expected result for the given sample dataset. Your mileage may vary.

1. dplyr, na.omit()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), na.omit))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 3
  ID_No cigsaday activity
  <int>    <int>    <int>
1     1        5        1
2     2        5        1

Note, this a dplyr version of ThomasIsCoding's answer.

2. dplyr, reduce(), coalesce()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), ~ purrr::reduce(.x, coalesce)))

3. data.table, fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, function(x) fcoalesce(as.list(x))), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

4. data.table, Reduce(), fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, Reduce, f = fcoalesce), ID_No]

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

A data.table option

> setDT(df)[, lapply(.SD, na.omit), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

Data

> dput(df)
structure(list(ID_No = c(1L, 2L, 1L, 2L), cigsaday = c(NA, NA,
5L, 5L), activity = c(1L, 1L, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))

Upvotes: 1

jsv
jsv

Reputation: 740

A possible solution using na.locf() which replaces a value with the most recent non-NA value.

library(zoo)

dat %>% 
  group_by(IDNo) %>% 
  mutate_at(vars(-group_cols()),.funs=function(x) na.locf(x)) %>% 
  distinct(IDNo,cigsaday,activity,.keep_all = TRUE) %>% 
  ungroup()

Upvotes: 0

Related Questions