Likan Zhan
Likan Zhan

Reputation: 1076

Stack multiples columns into two columns in DataFrame.jl

Suppose my source Dataframe and target Dataframe are df1 and df2 as being illustrated in the following.

I'm wondering how can I convert df1 to df2 using stack? Thanks.


julia> df1 = DataFrame(x1 = 1:4, x2 = 5:8, y1 = Char.(65:68), y2 = Char.(69:72))
4×4 DataFrame
 Row │ x1     x2     y1    y2   
     │ Int64  Int64  Char  Char 
─────┼──────────────────────────
   1 │     1      5  A     E
   2 │     2      6  B     F
   3 │     3      7  C     G
   4 │     4      8  D     H

julia> df2 = DataFrame(x = [d1.x1; d1.x2], y = [d1.y1; d1.y2])
8×2 DataFrame
 Row │ x      y    
     │ Int64  Char 
─────┼─────────────
   1 │     1  A
   2 │     2  B
   3 │     3  C
   4 │     4  D
   5 │     5  E
   6 │     6  F
   7 │     7  G
   8 │     8  H

Upvotes: 0

Views: 216

Answers (1)

Peace Wang
Peace Wang

Reputation: 2419

What you want is not a regular stack. Obviously, you can stack for each x and y singly, then hcat together to get your df21. But here is a different try.

julia> long = stack(df1,All())
16×2 DataFrame
 Row │ variable  value
     │ String    Any
─────┼─────────────────
   1 │ x1        1
   2 │ x1        2
   3 │ x1        3
   4 │ x1        4
   5 │ x2        5
   6 │ x2        6
   7 │ x2        7
   8 │ x2        8
   9 │ y1        A
  10 │ y1        B
  11 │ y1        C
  12 │ y1        D
  13 │ y2        E
  14 │ y2        F
  15 │ y2        G
  16 │ y2        H

Turn all x1,x2,y1,y2 to x,x,y,y and add new unique id column

julia> long.variable = SubString.(long.variable,1,1);
julia> long.id = repeat(1:8,2);
julia> long
16×3 DataFrame
 Row │ variable   value  id
     │ SubStrin…  Any    Int64
─────┼─────────────────────────
   1 │ x          1          1
   2 │ x          2          2
   3 │ x          3          3
   4 │ x          4          4
   5 │ x          5          5
   6 │ x          6          6
   7 │ x          7          7
   8 │ x          8          8
   9 │ y          A          1
  10 │ y          B          2
  11 │ y          C          3
  12 │ y          D          4
  13 │ y          E          5
  14 │ y          F          6
  15 │ y          G          7
  16 │ y          H          8

Finally, unstack back

julia> unstack(long, :variable, :value)
8×3 DataFrame
 Row │ id     x    y
     │ Int64  Any  Any
─────┼─────────────────
   1 │     1  1    A
   2 │     2  2    B
   3 │     3  3    C
   4 │     4  4    D
   5 │     5  5    E
   6 │     6  6    F
   7 │     7  7    G
   8 │     8  8    H

Upvotes: 1

Related Questions