cboettig
cboettig

Reputation: 12677

tidyverse spread with duplicate keys and fill NAs based on unique keys

Consider the minimal example:

library(tidyverse)
ex <-tribble(
  ~id, ~property, ~value,
  1,    "A",      9,
  1,    "A",      8,
  1,    "B",      7,
  2,    "A",      6,
  2,    "B",      5
)

My goal is to spread the property into columns to get this table:

tribble(
  ~id, ~A, ~B,
  1,  9,  7,
  1,  8,  7,
  2,  6,  5
)

Grouping by id and property and adding a key gets close but leaves NA:

## almost but not quite
ex %>% 
  group_by(id, property) %>%
  mutate(key = row_number()) %>%
  spread(property, value) %>% 
  select(-key) -> X
X

Gives:

     id     A     B
1     1     9     7
2     1     8    NA
3     2     6     5

I can solve this in the minimal example by splitting out the above by each property, dropping NAs, and joining back by id:

inner_join(
  na.omit(select(X, id, A)),
  na.omit(select(X, id, B))
)

but clearly that does not generalize to arbitrary set of properties. What is a better tidyverse strategy to do this?

NOTE: Several previous questions speak to the first half of this, e.g. constructing the key column so that spread does not fail, but couldn't see something addressing the NAs.

Upvotes: 4

Views: 696

Answers (1)

acylam
acylam

Reputation: 18661

You can use fill from tidyr:

library(dplyr)
library(tidyr)

ex %>% 
  group_by(id, property) %>%
  mutate(key = row_number()) %>%
  spread(property, value) %>% 
  select(-key) %>%
  group_by(id) %>%
  fill(-id)

Result:

# A tibble: 3 x 3
# Groups:   id [2]
     id     A     B
  <dbl> <dbl> <dbl>
1     1     9     7
2     1     8     7
3     2     6     5

Upvotes: 5

Related Questions