Warwick Wang
Warwick Wang

Reputation: 199

Julia tranpose grouped data in DataFrames?

ds = Dataset(group = repeat(1:3, inner = 2),
                                    b = repeat(1:2, inner = 3),
                                    c = repeat(1:1, inner = 6),
                                    d = repeat(1:6, inner = 1),
                                    e = string.('a':'f'))

In inmemorydatasets package, we can transpose grouped data like this.

#transpose by group
transpose(groupby(ds, :group), 2:4)

How can do I do this in DataFrames packages?

How can do I do this in R?

result:

Row │ group  variable  1       2      
     │ Int64  String    Int64?  Int64? 
─────┼─────────────────────────────────
   1 │     1  b              1       1
   2 │     1  c              1       1
   3 │     1  d              1       2
   4 │     2  b              1       2
   5 │     2  c              1       1
   6 │     2  d              3       4
   7 │     3  b              2       2
   8 │     3  c              1       1
   9 │     3  d              5       6

Upvotes: 4

Views: 93

Answers (1)

Dan Getz
Dan Getz

Reputation: 18227

Answer (attempt) regarding Julia DataFrames part of the question:

First creating the DataFrame:

df = DataFrame(group = repeat(1:3, inner = 2),
               b = repeat(1:2, inner = 3),
               c = repeat(1:1, inner = 6),
               d = repeat(1:6, inner = 1),
               e = string.('a':'f'))

Next, since the transpose operation depends on row ordering, we fix a row ordering in the groups:

julia> ordereddf = transform(DataFrames.groupby(df, :group),"group" => (x->1:length(x)) => "rn")[:,Not(:e)]
6×5 DataFrame
 Row │ group  b      c      d      rn    
     │ Int64  Int64  Int64  Int64  Int64 
─────┼───────────────────────────────────
   1 │     1      1      1      1      1
   2 │     1      1      1      2      2
   3 │     2      1      1      3      1
   4 │     2      2      1      4      2
   5 │     3      2      1      5      1
   6 │     3      2      1      6      2

Finally, the stack - unstack combo, does the transposing bit:

julia> sort!(unstack(stack(ordereddf,[:b,:c,:d]),:rn, :value),:group)
9×4 DataFrame
 Row │ group  variable  1       2      
     │ Int64  String    Int64?  Int64? 
─────┼─────────────────────────────────
   1 │     1  b              1       1
   2 │     1  c              1       1
   3 │     1  d              1       2
   4 │     2  b              1       2
   5 │     2  c              1       1
   6 │     2  d              3       4
   7 │     3  b              2       2
   8 │     3  c              1       1
   9 │     3  d              5       6

Feels like there might be easier ways to do this, but in general, transpose is rarely appropriate for database-like tables, and if it is appropriate, then maybe a matrix should have been used to store information in the first place.

The R part is left for someone else to answer.

Upvotes: 3

Related Questions