getting-there
getting-there

Reputation: 1409

R: Can you specify the order of variable columns from reshape/cast?

I'm using the cast function to create a wide formatted data frame. I'd like to be able to control the ordering of the columns that result from using cast. Is this possible?

In the example below, they are order: cogs_xdep, sales, sga

I would like to order them: sales, cogs_xdep, sga

The whole process actually starts with a wide formatted frame that I use melt to change into a long format before the cast.

An example is below

>rawdata_all
  coy_name gvkey_iid    factor X20130930 X20130831 X20130731 X20130630 X20130531    X20130430 X20130331 X20130228 X20130131 X20121231
1    Coy 1    111111     sales         1         2         3         4         5         6         7         8         9        10
2    Coy 2     22222     sales         2        12        22        32        42        52        62        72        82        92
3    Coy 3    333333     sales         3       103       203       303       403       503       603       703       803       903
4    Coy 1    111111 cogs_xdep         4         5         6         7         8         9        10        11        12        13
5    Coy 2     22222 cogs_xdep         5        15        25        35        45        55        65        75        85        95
6    Coy 3    333333 cogs_xdep         6       106       206       306       406       506       606       706       806       906
7    Coy 1    111111       sga         7         8         9        10        11        12        13        14        15        16
8    Coy 2     22222       sga         8        18        28        38        48        58        68        78        88        98
9    Coy 3    333333       sga         9       109       209       309       409       509       609       709       809       909
...

Melt to put it in long format

> non_data_cols <- 3       # There are 3 non-value columns

> master_long <- melt(rawdata_all, id=1:non_data_cols,measured=(non_data_cols+1):length(rawdata_all))

> master_long

    coy_name gvkey_iid    factor  variable value
1      Coy 1    111111     sales X20130930     1
2      Coy 2     22222     sales X20130930     2
3      Coy 3    333333     sales X20130930     3
4      Coy 1    111111 cogs_xdep X20130930     4
5      Coy 2     22222 cogs_xdep X20130930     5
6      Coy 3    333333 cogs_xdep X20130930     6
7      Coy 1    111111       sga X20130930     7
8      Coy 2     22222       sga X20130930     8
...

Finally cast on 'factor' to create wider data frame. (I also renamed 'variable' to 'date' and removed the 'X' from the start of the date values).

> master <- cast(master_long, ...~factor)
 coy_name gvkey_iid     date cogs_xdep sales  sga
1     Coy 1    111111 20130930         4     1    7
2     Coy 1    111111 20130831         5     2    8
3     Coy 1    111111 20130731         6     3    9
4     Coy 1    111111 20130630         7     4   10
5     Coy 1    111111 20130531         8     5   11
6     Coy 1    111111 20130430         9     6   12
7     Coy 1    111111 20130331        10     7   13
8     Coy 1    111111 20130228        11     8   14
9     Coy 1    111111 20130131        12     9   15
10    Coy 1    111111 20121231        13    10   16
...

I would ideally like to have the final 3 columns in the following order: sales, cogs_xdep, sga. cast appears to have arranged them alphabetically as you can see they are ordered in the desired way in both the original data frame and the long formatted data frame.

Any suggestions would be greatly appreciated. While it is easier enough to rearrange the columns with only 3, it's more cumbersome in the real situation of 30+ columns.

Thanks,

Upvotes: 3

Views: 4921

Answers (2)

Estatistics
Estatistics

Reputation: 946

I found here: dcast - order factors how you can tell dcast do not change the order of factors alphabetically.

   dff<-structure(list(ID_no = c(2, 2, 2, 2, 2, 29, 29, 29, 29, 29), 
Variable = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,  2L), 
  .Label = c("q1", "q2"    ), class = "factor"), 
  Levels = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), 
    .Label = c("vf1", "df2", "fd3", "re4", "fr5"    ), class = "factor"), 
  Frequency = c(2, 15, 41, 28, 20, 6,  0, 5, 27, 82), 
  Percentage = c(1.89, 14.15, 38.68, 26.42,  18.87, 5, 0, 4.17, 22.5, 68.33)), 
  .Names = c("ID_no", "Variable", "Levels", "Frequency", "Percentage"),
  row.names = c(NA, -10L), class = c("data.table", "data.frame"))





 ID_no  Variable Levels   Frequency   Percentage
 1:     2       q1    vf1         2       1.89
 2:     2       q1    df2        15      14.15
 3:     2       q1    fd3        41      38.68
 4:     2       q1    re4        28      26.42
 5:     2       q1    fr5        20      18.87
 6:    29       q2    vf1         6       5.00
 7:    29       q2    df2         0       0.00
 8:    29       q2    fd3         5       4.17
 9:    29       q2    re4        27      22.50
10:    29       q2    fr5        82      68.33

 dcast(setDT(dff), Variable~factor(Levels, levels=unique(Levels)), value.var=c("Percentage"))

Some random factor levels, with their number to indicate their order.

  Variable  vf1   df2   fd3   re4   fr5
1:       q1 1.89 14.15 38.68 26.42 18.87
2:       q2 5.00  0.00  4.17 22.50 68.33

Hope this help.

Upvotes: 0

Roland
Roland

Reputation: 132746

I don't see a reason, why column order should matter. However, you can always change the order like this:

master[,c(names(master)[1:3], as.character(unique(master_long$factor)))]

   coy_name gvkey_iid  variable sales cogs_xdep sga
1     Coy_1    111111 X20130930     1         4   7
2     Coy_1    111111 X20130831     2         5   8
3     Coy_1    111111 X20130731     3         6   9
4     Coy_1    111111 X20130630     4         7  10
5     Coy_1    111111 X20130531     5         8  11
6     Coy_1    111111 X20130430     6         9  12
7     Coy_1    111111 X20130331     7        10  13
8     Coy_1    111111 X20130228     8        11  14
9     Coy_1    111111 X20130131     9        12  15
10    Coy_1    111111 X20121231    10        13  16
11    Coy_2     22222 X20130930     2         5   8
12    Coy_2     22222 X20130831    12        15  18
13    Coy_2     22222 X20130731    22        25  28
14    Coy_2     22222 X20130630    32        35  38
15    Coy_2     22222 X20130531    42        45  48
16    Coy_2     22222 X20130430    52        55  58
17    Coy_2     22222 X20130331    62        65  68
18    Coy_2     22222 X20130228    72        75  78
19    Coy_2     22222 X20130131    82        85  88
20    Coy_2     22222 X20121231    92        95  98
21    Coy_3    333333 X20130930     3         6   9
22    Coy_3    333333 X20130831   103       106 109
23    Coy_3    333333 X20130731   203       206 209
24    Coy_3    333333 X20130630   303       306 309
25    Coy_3    333333 X20130531   403       406 409
26    Coy_3    333333 X20130430   503       506 509
27    Coy_3    333333 X20130331   603       606 609
28    Coy_3    333333 X20130228   703       706 709
29    Coy_3    333333 X20130131   803       806 809
30    Coy_3    333333 X20121231   903       906 909

Note that package reshape2 has mostly superseded reshape.

Upvotes: 2

Related Questions