JerryN
JerryN

Reputation: 2506

r data table dcast order of new columns

The csv below is from a much longer data table, call it temp. I want to cast it to temp.wide with region_code as columns and with the vertical order of region_code (SAS, SSA, EUR, ...) as the order of the columns. I just noticed that dcast orders the new columns alphabetically.

         scenario region_code                  region_name    value
 1:          2010         SAS                   South Asia 61.17716
 2:          2010         SSA   Africa south of the Sahara 62.08588
 3:          2010         EUR                       Europe 63.76123
 4:          2010         LAC  Latin America and Caribbean 68.84806
 5:          2010         FSU          Former Soviet Union 59.04499
 6:          2010         EAP        East Asia and Pacific 64.00579
 7:          2010         NAM                North America 66.18235
 8:          2010         MEN Middle East and North Africa 58.03167
 9: SSP2-NoCC-REF         SAS                   South Asia 57.29973
10: SSP2-NoCC-REF         SSA   Africa south of the Sahara 65.14987
11: SSP2-NoCC-REF         EUR                       Europe 63.99204
12: SSP2-NoCC-REF         LAC  Latin America and Caribbean 68.21118
13: SSP2-NoCC-REF         FSU          Former Soviet Union 60.10807
14: SSP2-NoCC-REF         EAP        East Asia and Pacific 63.86103
15: SSP2-NoCC-REF         NAM                North America 65.97859
16: SSP2-NoCC-REF         MEN Middle East and North Africa 58.98356

temp = setDT(structure(list(scenario = c("2010", "2010", "2010", "2010", "2010", 
"2010", "2010", "2010", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", 
"SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", "SSP2-NoCC-REF", 
"SSP2-NoCC-REF"), region_code = c("SAS", "SSA", "EUR", "LAC", 
"FSU", "EAP", "NAM", "MEN", "SAS", "SSA", "EUR", "LAC", "FSU", 
"EAP", "NAM", "MEN"), region_name = c("South Asia", "Africa south of the Sahara", 
"Europe", "Latin America and Caribbean", "Former Soviet Union", 
"East Asia and Pacific", "North America", "Middle East and North Africa", 
"South Asia", "Africa south of the Sahara", "Europe", "Latin America and Caribbean", 
"Former Soviet Union", "East Asia and Pacific", "North America", 
"Middle East and North Africa"), value = c(61.1771623260257, 
62.0858809906661, 63.7612306428217, 68.84805628195, 59.0449875464304, 
64.0057851485101, 66.182351351389, 58.0316719859857, 57.299725759211, 
65.1498720847705, 63.9920412193261, 68.2111842947542, 60.1080745513644, 
63.86103368494, 65.9785850777114, 58.9835574681585)), .Names = c("scenario", 
"region_code", "region_name", "value"), row.names = c(NA, -16L
), class = "data.frame"))

Here's the code I used.

formula.wide <- "scenario ~ region_code"
  temp.wide <- data.table::dcast(
    data = temp,
    formula = formula.wide,
    value.var = "value")


        scenario      EAP      EUR      FSU      LAC      MEN      NAM      SAS      SSA
1:          2010 64.00579 63.76123 59.04499 68.84806 58.03167 66.18235 61.17716 62.08588
2: SSP2-NoCC-REF 63.86103 63.99204 60.10807 68.21118 58.98356 65.97859 57.29973 65.14987

The new column names are scenario, EAP, EUR, FSU, LAC, MEN, NAM, SAS, SSA.

I can grab the correct order from temp and then use setcolorder to give temp.wide the correct column order. But I was wondering if there some way to not have the new column order alphabetized.

Also, the help text for dcast says

Names for columns that are being cast are generated in the same order (separated by an underscore, _) from the (unique) values in each column mentioned in the formula RHS.

If I am understanding this correctly, I don't think it describes what dcast actually does. But I don't understand what the parenthetical phrase (separated by an underscore, _) means.

Upvotes: 1

Views: 1701

Answers (1)

Frank
Frank

Reputation: 66819

with the vertical order of region_code (SAS, SSA, EUR, ...) as the order of the columns

Just pass a factor with appropriate levels:

dcast(temp, scenario ~ factor(region_code, levels=unique(region_code)))

        scenario      SAS      SSA      EUR      LAC      FSU      EAP      NAM      MEN
1:          2010 61.17716 62.08588 63.76123 68.84806 59.04499 64.00579 66.18235 58.03167
2: SSP2-NoCC-REF 57.29973 65.14987 63.99204 68.21118 60.10807 63.86103 65.97859 58.98356

The documentation quoted in the OP sounds correct to me; in z ~ x + y -- x's unique values come before y's unique values in the order of the resulting column names.

Upvotes: 7

Related Questions