xiaodai
xiaodai

Reputation: 16004

How to pivot two columns at a time?

Consider this input and I wish to pivot val1 and val2 using cname as column name.

wide = DataFrame(x = 1:12,
       a  = 2:13,
       b  = 3:14,
       val1  = randn(12),
       val2  = randn(12),
       cname = repeat(["c", "d"], inner =6)
       )

12×6 DataFrame
│ Row │ x     │ a     │ b     │ val1      │ val2      │ cname  │
│     │ Int64 │ Int64 │ Int64 │ Float64   │ Float64   │ String │
├─────┼───────┼───────┼───────┼───────────┼───────────┼────────┤
│ 1   │ 1     │ 2     │ 3     │ 1.51014   │ -1.18548  │ c      │
│ 2   │ 2     │ 3     │ 4     │ 0.0845411 │ -0.370083 │ c      │
│ 3   │ 3     │ 4     │ 5     │ 0.826283  │ -1.00423  │ c      │
│ 4   │ 4     │ 5     │ 6     │ -0.53175  │ -1.16659  │ c      │
│ 5   │ 5     │ 6     │ 7     │ -1.77975  │ 0.336333  │ c      │
│ 6   │ 6     │ 7     │ 8     │ 0.632577  │ 0.236621  │ c      │
│ 7   │ 7     │ 8     │ 9     │ -0.681532 │ 1.14869   │ d      │
│ 8   │ 8     │ 9     │ 10    │ -0.775619 │ 0.393475  │ d      │
│ 9   │ 9     │ 10    │ 11    │ -0.533034 │ 0.059624  │ d      │
│ 10  │ 10    │ 11    │ 12    │ 0.496152  │ -1.23507  │ d      │
│ 11  │ 11    │ 12    │ 13    │ 0.834099  │ 2.12115   │ d      │
│ 12  │ 12    │ 13    │ 14    │ 0.532357  │ -0.369267 │ d      │

In the tidyverse, the pivot_wider function can do this

wide %>% pivot_wider(names_from = cname, values_from = c(val1,val2))


===  ===  ===  ==========  ==========  ==========  ==========
  x    a    b      val1_c      val1_d      val2_c      val2_d
===  ===  ===  ==========  ==========  ==========  ==========
  1    2    3   1.0174232          NA  -0.6611959          NA
  2    3    4   0.6590795          NA  -2.0954505          NA
  3    4    5   1.2939581          NA   1.6350356          NA
  4    5    6  -1.9395356          NA   0.7813238          NA
  5    6    7   0.3558087          NA   0.9789414          NA
  6    7    8   0.9859100          NA  -0.9803336          NA
  7    8    9          NA   0.4949224          NA  -0.0659333
  8    9   10          NA   0.5024755          NA  -0.2317832
  9   10   11          NA   1.6926897          NA  -0.3840687
 10   11   12          NA  -0.4324705          NA  -0.0901276
 11   12   13          NA  -0.6415260          NA   0.0014151
 12   13   14          NA   1.2406868          NA  -2.1959740
===  ===  ===  ==========  ==========  ==========  ==========

Julia's DataFrames.unstack does not work

For example

using DataFrames
unstack(wide, [:x, :a,:b], :cname, [:val1,:val2]) 

What is the R (data.table)/Scala/etc solution for comparison? Solutions from any language welcome.

But I don't want any python solutions as it clashes with another.

Upvotes: 1

Views: 78

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69869

Unfortunately this currently is not supported in DataFrames.jl in one call (but the feature was requested to be added to unstack, so it will be available in the future). Currently you have to do:

tmp = [unstack(wide, [:x, :a, :b], :cname, v, renamecols=x->v*"_"*x) for v in ["val1", "val2"]]
result = [tmp[1] tmp[2][:, 4:end]]

or

innerjoin([unstack(wide, [:x, :a, :b], :cname, v, renamecols=x->v*"_"*x) for v in ["val1", "val2"]]..., on=[:x, :a, :b])

Upvotes: 1

Related Questions