Reputation: 600

Ignoring NAs while aggregating character vector

I have this very simple dataset.

id <- c(12, 13, 26, 45, 55, 66)
group <- c("one", "one", "two", "two", "three", "three")
txt <- c("quick", "brown", NA, "fox", NA, NA)

df <- data.frame(id, group, txt)
df
> df
  id group   txt
1 12   one quick
2 13   one brown
3 26   two  <NA>
4 45   two   fox
5 55 three  <NA>
6 66 three  <NA>

I want to aggregate by group. I am trying this,

df |> 
  dplyr::group_by(group) |> 
  dplyr::summarise(txt = paste(txt, collapse = ", "))

# A tibble: 3 × 2
  group txt         
  <chr> <chr>       
1 one   quick, brown
2 three NA, NA      
3 two   NA, fox

As we see the NA are outed as characters. Secondly, three comes after one in the group column. I want to have something like this,

  group          txt
1   one quick, brown
2   two          fox
3 three         <NA>

Any idea how to fix this simple problem?

Upvotes: 1

Answers (3)

G. Grothendieck

Reputation: 270045

Check if all values are NA and if so return NA and otherwise remove the NA's and use toString on the remaining elements.

library(dplyr)

df %>%
  summarize(txt = if (all(is.na(txt))) NA else toString(na.omit(txt)), .by=group)

A variation of that approach which is slightly more compact is to compute toString as above but then use Find to return it or NULL if it is the empty string. ... %||% NA then converts NULL to NA.

df %>%
  summarize(txt = Find(nzchar, toString(na.omit(txt))) %||% NA, .by = group)

Upvotes: 0

Friede

Reputation: 7979

Base R

aggregate(txt~group, df, na.action = 'na.pass', 
          FUN = \(i) {i = na.omit(unique(i)); if (length(i)) i else NA})[
            order(unique(df$group)), ]

  group          txt
1   one quick, brown
3   two          fox
2 three           NA

Upvotes: 2

Ronak Shah

Reputation: 389205

You may either handle it yourself by code -

library(dplyr)

df |> 
  dplyr::summarise(txt = if(all(is.na(txt))) NA else toString(na.omit(txt)), .by = group)

   group          txt
#1   one quick, brown
#2   two          fox
#3 three         <NA>

Or use hablar::max_ which does the same thing.

df |> 
  dplyr::summarise(txt = toString(hablar::max_(txt)), .by = group)

Upvotes: 2

Ignoring NAs while aggregating character vector

Answers (3)

Related Questions