Reputation: 867
I have a dataframe that I have obtained after converting a long format dataframe into a wide format dataframe using tidyr::spread()
The dataframe
orig_dataframe = data.frame(ID = c(1,1,2,2,3,3,4,4), Name = c("","a", "", "b", "", "c", "", "d"), Val1=(NA, 10,NA, 20, NA, 30,NA,40), Val2 = (100,NA, 200,NA, 300, NA, 400,NA))
looks like below
ID Name Val1 Val2
1 1 NA 10
2 1 a 100 NA
3 2 NA 20
4 2 b 200 NA
5 3 NA 30
6 3 c 300 NA
7 4 NA 40
8 4 d 400 NA
I want to combine the rows with duplicate IDs so the dataframe looks like this:
ID Name Val1 Val2
1 1 a 100 10
2 2 b 200 20
3 3 c 300 30
4 4 d 400 40
How can I do that?
Upvotes: 1
Views: 1437
Reputation: 887128
We may convert the blank (""
to NA
(na_if
), then grouped by 'ID", remove the NA elements in the columns (assuming the number of non-NA elements are same per group for each column)
library(dplyr)
orig_dataframe %>%
na_if("") %>%
group_by(ID) %>%
summarise(across(everything(), ~.x[complete.cases(.x)]))
-output
# A tibble: 4 × 4
ID Name Val1 Val2
<dbl> <chr> <dbl> <dbl>
1 1 a 10 100
2 2 b 20 200
3 3 c 30 300
4 4 d 40 400
Upvotes: 2