Marshall
Marshall

Reputation: 85

Using pivot_wider with over 100 variables

Seeking some help on the use of pivot_wider. I have large dataset with over 100 variables, and 300K observations. I'm trying to spread the information out even further based on a key variable, but I may even be using the wrong approach. A more concise example:

> example <- data.frame(incident_num = c("X1", "X1", "X2", "X3", "X3", "X3", "X4"),
                        unit_num     = c("T1", "E2", "M1", "M3", "T5", "E6", "M5"))
> example
  incident_num unit_num
1           X1       T1
2           X1       E2
3           X2       M1
4           X3       M3
5           X3       T5
6           X3       E6
7           X4       M5

What I am trying to get to is this:

output
  incident_num unit_num_1 unit_num_2 unit_num_3
1           X1         T1         E2       <NA>
2           X2         M1       <NA>       <NA>
3           X3         M3         T5         E6
4           X4         M5       <NA>       <NA>

This also includes all the other variables associated with the unit_num. Any help would be greatly appreciated!

Upvotes: 1

Views: 174

Answers (2)

Yuriy Saraykin
Yuriy Saraykin

Reputation: 8880

library(tidyverse)

example %>% 
  group_by(incident_num) %>% 
  mutate(id = seq_along(incident_num)) %>% 
  pivot_wider(incident_num,
              names_from = id,
              values_from = unit_num,
              names_prefix = "unit_num_")

look at the result, did I understand correctly?

df <- data.frame(incident_num = c("X1", "X1", "X2", "X3", "X3", "X3", "X4"),
                 unit_num1     = c("T1", "E2", "M1", "M3", "T5", "E6", "M5"), 
                 unit_num2     = c("A1", "B2", "C1", "C3", "J5", "U6", "B5"))

df %>% 
  group_by(incident_num) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(incident_num,
              names_from = id,
              values_from = vars_select(names(df), starts_with("unit_num")))

Please show the result for your example. Do you expect this?

df %>% 
  group_by(incident_number) %>%
  mutate(id = row_number()) %>% 
  pivot_wider(incident_number, 
              names_from = id, 
              values_from = tidyselect::vars_select(names(df), -incident_number))

Upvotes: 3

akrun
akrun

Reputation: 886948

We can create a sequence column grouped by 'incident_num' and then do the pivot_wider

library(dplyr)
library(tidyr)
library(stringr)
example %>% 
    group_by(incident_num) %>%
    mutate(rn = str_c('unit_num_', row_number())) %>%
    ungroup %>%
    pivot_wider(names_from = rn, values_from = unit_num)
# A tibble: 4 x 4
#  incident_num unit_num_1 unit_num_2 unit_num_3
#  <fct>        <fct>      <fct>      <fct>     
#1 X1           T1         E2         <NA>      
#2 X2           M1         <NA>       <NA>      
#3 X3           M3         T5         E6        
#4 X4           M5         <NA>       <NA>      

Upvotes: 2

Related Questions