Jelmer
Jelmer

Reputation: 351

Aggregate rows with string values in R

I have a dataframe df with only string values. I need to aggregate these rows on idand session and fill the NA values. My original dataframe has 50 columns but this is just an example dataframe. You can assume that for each combination of id and session the values (string1 or string2) are the same, if they don't have an NA value.

session <- c('s1', 's1', 's1', 's2', 's2', 's2')
string1 <- c('first_string1', NA, 'first_string1', NA, 'first_string3', NA)
string2 <- c(NA, 'second_string2', 'second_string2', 'second_string4', NA, NA)
df <- data.frame(id, session, string1, string2)

df

  id session       string1        string2
1  a      s1 first_string1           <NA>
2  a      s1          <NA> second_string2
3  a      s1 first_string1 second_string2
4  b      s2          <NA> second_string4
5  b      s2 first_string3           <NA>
6  b      s2          <NA>           <NA>

The final dataframe should look like this:

  id session       string1        string2
1  a      s1 first_string1 second_string2
2  b      s2 first_string3 second_string4

I have tried to using the aggregate function but I can't figure out how to get this working

Upvotes: 1

Views: 867

Answers (3)

user10191355
user10191355

Reputation:

With aggregate you can do something like this, where you include a function that removes NAs and finds unique rows while aggregating:

aggregate(df[c("string1", "string2")],
          by = list(id = id, session = session),
          function(x) unique(na.omit(x)))

#### OUTPUT ####

  id session       string1        string2
1  a      s1 first_string1 second_string2
2  b      s2 first_string3 second_string4

Base R's merge is another, perhaps slightly easier to understand, option:

unique(na.omit(merge(df[c("id", "session", "string1")],
                     df[c("id", "session", "string2")],
                     by = c("id", "session")
                     )))

#### OUTPUT #### 

  id session       string1        string2
1  a      s1 first_string1 second_string2
2  b      s2 first_string3 second_string4

Upvotes: 2

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

Another option is:

library(dplyr)

df %>%
  group_by(id, session) %>%
  summarise_at(vars(starts_with("string")), ~unique(na.omit(.)))

# A tibble: 2 x 4
# Groups:   id [2]
  id    session string1       string2       
  <chr> <chr>   <chr>         <chr>         
1 a     s1      first_string1 second_string2
2 b     s2      first_string3 second_string4

A base R solution

aggregate(cbind(string1, string2) ~ id + session, data = df, function(x) unique(na.omit(x)), na.action = na.pass)

  id session       string1        string2
1  a      s1 first_string1 second_string2
2  b      s2 first_string3 second_string4

Upvotes: 1

denisafonin
denisafonin

Reputation: 1136

A bit clunky, but works:

library(tidyverse)

df %>% 
  group_by (id, session) %>%
  summarise(string1 =  paste(unique(string1[!is.na(string1)]), collapse = ""),
            string2 =  paste(unique(string2[!is.na(string2)]), collapse = ""))

Output:

id    session string1       string2       
  <fct> <fct>   <chr>         <chr>         
1 a     s1      first_string1 second_string2
2 b     s2      first_string3 second_string4

Upvotes: 0

Related Questions