Reputation:

Merge columns within dataframe based on column value R

I currently have a data frame of this structure

    ID-No  cigsaday   activity  
    1      NA        1           
    2      NA        1          
    1       5       NA          
    2       5       NA

I want to concatenate the rows with the identical ID numbers and create a new data frame that is supposed to look like this

ID-No  cigsaday   activity  
    1      5        1           
    2      5        1

The data frame includes characters as well as numerical, in this way we would match based on a participant ID which occurs 4 times in the dataset within the first column.

Any help is appreciated!

Upvotes: 0

Answers (3)

Uwe

Reputation: 42582

Many ways lead to Rome. For the sake of completeness, here are some other approaches which return the expected result for the given sample dataset. Your mileage may vary.

1. dplyr, na.omit()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), na.omit))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 3
  ID_No cigsaday activity
  <int>    <int>    <int>
1     1        5        1
2     2        5        1

Note, this a dplyr version of ThomasIsCoding's answer.

2. dplyr, reduce(), coalesce()

library(dplyr)
df %>% 
  group_by(ID_No) %>% 
  summarise(across(everything(), ~ purrr::reduce(.x, coalesce)))

3. data.table, fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, function(x) fcoalesce(as.list(x))), ID_No]

   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

4. data.table, Reduce(), fcoalesce()

library(data.table)
setDT(df)[, lapply(.SD, Reduce, f = fcoalesce), ID_No]

Upvotes: 1

ThomasIsCoding

Reputation: 102625

A data.table option

> setDT(df)[, lapply(.SD, na.omit), ID_No]
   ID_No cigsaday activity
1:     1        5        1
2:     2        5        1

Data

> dput(df)
structure(list(ID_No = c(1L, 2L, 1L, 2L), cigsaday = c(NA, NA,
5L, 5L), activity = c(1L, 1L, NA, NA)), class = "data.frame", row.names = c(NA,
-4L))