How to merge rows of a dataframe that have the same value in a column

Question

I have a dataframe that I have obtained after converting a long format dataframe into a wide format dataframe using tidyr::spread()

The dataframe

orig_dataframe = data.frame(ID = c(1,1,2,2,3,3,4,4), Name = c("","a", "", "b", "", "c", "", "d"), Val1=(NA, 10,NA, 20, NA, 30,NA,40), Val2 = (100,NA, 200,NA, 300, NA, 400,NA))

looks like below

    ID    Name    Val1    Val2
1   1              NA      10
2   1      a       100     NA
3   2              NA      20
4   2      b       200     NA
5   3              NA      30
6   3      c       300     NA
7   4              NA      40
8   4      d       400     NA

I want to combine the rows with duplicate IDs so the dataframe looks like this:

    ID    Name    Val1    Val2
1   1      a       100      10
2   2      b       200      20
3   3      c       300      30
4   4      d       400      40

How can I do that?

akrun · Accepted Answer

We may convert the blank ("" to NA (na_if), then grouped by 'ID", remove the NA elements in the columns (assuming the number of non-NA elements are same per group for each column)

library(dplyr)
orig_dataframe %>%
  na_if("") %>%
  group_by(ID) %>%
  summarise(across(everything(),  ~.x[complete.cases(.x)]))

-output

# A tibble: 4 × 4
     ID Name   Val1  Val2
     
1     1 a        10   100
2     2 b        20   200
3     3 c        30   300
4     4 d        40   400

How to merge rows of a dataframe that have the same value in a column

Answers (1)

Related Questions