Dio
Dio

Reputation: 97

Expand a data frame by group

I have a big data frame which consists of 1000 data frames (500x500), and I created by following code:

setwd("user/all_csv")
archivos <- list.files(full.names = F)
big.df <- lapply(archivos, read.csv, header = TRUE) %>%
       set_names(archivos)%>%
       bind_rows(.id = 'grp')

The big.df looks like below (a small example):

grp           X    X1     X2     X5
2020_01_19    1    23     47     3
2020_01_19    2    13     45     54
2020_01_19    5    23     41     21
2020_01_20    1    65     32     19
2020_01_20    2    39     52     12
2020_01_20    5    43     76     90
...

How can I generate the output below?:

             1-X1   1-X2   1-X5    2-X1    2-X2    2-X5    5-X1    5-X2    5-X5
2020_01_19    23     47     3       13      45      54      23      41      21
2020_01_20    65     32     19      39      52      12      43      76      90
...

I don't really know how to proceed. Any help would be greatly appreciated.

Upvotes: 0

Views: 376

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

use tidyr::pivot_wider with names_glue argument as follows.

  • Store name of all variables (even 500) to be pivoted into a vector say cols
  • Use values_from = all_of(cols) as argument in pivot_wider
cols <- c('X1', 'X2', 'X5')
df %>% pivot_wider(id_cols = grp, names_from = X, values_from = all_of(cols), 
                  names_glue = '{X}-{.value}')

# A tibble: 2 x 10
  grp        `1-X1` `2-X1` `5-X1` `1-X2` `2-X2` `5-X2` `1-X5` `2-X5` `5-X5`
  <chr>       <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
1 2020_01_19     23     13     23     47     45     41      3     54     21
2 2020_01_20     65     39     43     32     52     76     19     12     90

If you want to use all columns except first two, use this

df %>% pivot_wider(id_cols = grp, names_from = X, values_from = !c(grp, X), 
                   names_glue = '{X}-{.value}')

# A tibble: 2 x 10
  grp        `1-X1` `2-X1` `5-X1` `1-X2` `2-X2` `5-X2` `1-X5` `2-X5` `5-X5`
  <chr>       <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
1 2020_01_19     23     13     23     47     45     41      3     54     21
2 2020_01_20     65     39     43     32     52     76     19     12     90

However, if you want to rearrange columns as shown in expected outcome, you may use names_vary = 'slowest' in pivot_wider function of tidyr 1.2.0.

Upvotes: 1

Related Questions