Reputation: 351
I have a dataframe df
with only string
values. I need to aggregate these rows on id
and session
and fill the NA
values. My original dataframe has 50 columns but this is just an example dataframe. You can assume that for each combination of id
and session
the values (string1 or string2) are the same, if they don't have an NA value.
session <- c('s1', 's1', 's1', 's2', 's2', 's2')
string1 <- c('first_string1', NA, 'first_string1', NA, 'first_string3', NA)
string2 <- c(NA, 'second_string2', 'second_string2', 'second_string4', NA, NA)
df <- data.frame(id, session, string1, string2)
df
id session string1 string2
1 a s1 first_string1 <NA>
2 a s1 <NA> second_string2
3 a s1 first_string1 second_string2
4 b s2 <NA> second_string4
5 b s2 first_string3 <NA>
6 b s2 <NA> <NA>
The final dataframe should look like this:
id session string1 string2
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
I have tried to using the aggregate function but I can't figure out how to get this working
Upvotes: 1
Views: 867
Reputation:
With aggregate
you can do something like this, where you include a function that removes NAs and finds unique rows while aggregating:
aggregate(df[c("string1", "string2")],
by = list(id = id, session = session),
function(x) unique(na.omit(x)))
#### OUTPUT ####
id session string1 string2
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
Base R's merge
is another, perhaps slightly easier to understand, option:
unique(na.omit(merge(df[c("id", "session", "string1")],
df[c("id", "session", "string2")],
by = c("id", "session")
)))
#### OUTPUT ####
id session string1 string2
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
Upvotes: 2
Reputation: 6441
Another option is:
library(dplyr)
df %>%
group_by(id, session) %>%
summarise_at(vars(starts_with("string")), ~unique(na.omit(.)))
# A tibble: 2 x 4
# Groups: id [2]
id session string1 string2
<chr> <chr> <chr> <chr>
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
A base R solution
aggregate(cbind(string1, string2) ~ id + session, data = df, function(x) unique(na.omit(x)), na.action = na.pass)
id session string1 string2
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
Upvotes: 1
Reputation: 1136
A bit clunky, but works:
library(tidyverse)
df %>%
group_by (id, session) %>%
summarise(string1 = paste(unique(string1[!is.na(string1)]), collapse = ""),
string2 = paste(unique(string2[!is.na(string2)]), collapse = ""))
Output:
id session string1 string2
<fct> <fct> <chr> <chr>
1 a s1 first_string1 second_string2
2 b s2 first_string3 second_string4
Upvotes: 0