Reputation: 596
I have a data frame that I would like to categorise the columns, across rows, as belonging to the first, second, third, fourth or fifth qunitile (a bit confusing I know but the example should clarify). This I have done but the problem is that, firstly not all factor levels are present in each variable and secondly, the factors are not ordered in the most logical way. Here are some test data.
x.df<-structure(list(Location = structure(1:6, .Label = c("Site A",
"Site B", "Site C", "Site D", "Site E", "Site F"), class = "factor"),
Var1 = c(78L, 5L, 85L, 87L, 89L, 82L), Var2 = c(98L, 5L,
67L, 92L, 3L, 44L), Var3 = c(30L, 54L, 56L, 3L, 31L, 58L),
Var4 = c(63L, 96L, 14L, 95L, 90L, 99L), Var5 = c(71L, 52L,
78L, 93L, 74L, 26L), Var6 = c(21L, 66L, 57L, 42L, 39L, 69L
), Var7 = c(97L, 42L, 84L, 46L, 86L, 46L), Var8 = c(100L,
99L, 6L, 41L, 94L, 20L), Var9 = c(84L, 82L, 26L, 91L, 38L,
80L), Var10 = c(8L, 50L, 23L, 92L, 46L, 1L)), .Names = c("Location",
"Var1", "Var2", "Var3", "Var4", "Var5", "Var6", "Var7", "Var8",
"Var9", "Var10"), class = "data.frame", row.names = c(NA, -6L
))
cut_fn<-function(x){cut(x,quantile(x, c(0.0,0.2,0.4,0.6,0.8,1.0)),include.lowest=TRUE, c("lowest","low","middle","high","highest"))}
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)))
Thus each row has two each of "highest","high", ..., "lowest".
r.df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 middle highest low low middle lowest high highest high lowest
2 lowest lowest middle highest middle high low highest high low
3 highest high middle lowest high middle highest lowest low low
4 middle high lowest highest highest low low lowest middle high
5 high lowest lowest highest middle low high highest low middle
6 highest low middle highest low high middle lowest high lowest
str(r.df)
'data.frame': 6 obs. of 10 variables:
$ X1 : Factor w/ 4 levels "high","highest",..: 4 3 2 4 1 2
$ X2 : Factor w/ 4 levels "high","highest",..: 2 4 1 1 4 3
$ X3 : Factor w/ 3 levels "low","lowest",..: 1 3 3 2 2 3
$ X4 : Factor w/ 3 levels "highest","low",..: 2 1 3 1 1 1
$ X5 : Factor w/ 4 levels "high","highest",..: 4 4 1 2 4 3
$ X6 : Factor w/ 4 levels "high","low","lowest",..: 3 1 4 2 2 1
$ X7 : Factor w/ 4 levels "high","highest",..: 1 3 2 3 1 4
$ X8 : Factor w/ 2 levels "highest","lowest": 1 1 2 2 1 2
$ X9 : Factor w/ 3 levels "high","low","middle": 1 1 2 3 2 1
$ X10: Factor w/ 4 levels "high","low","lowest",..: 3 2 2 1 4 3
Ideally what I'd like is all variables having the (ordered) structure:
$ X1 : Factor w/ 5 levels "highest","high",..:
Upvotes: 2
Views: 218
Reputation: 24079
If I understand your question correctly, you would like to have each column ordered. This easiest way is to loop through all of the column converting them with the factor function with the ordered=TRUE option.
Try this:
#first create r.df with stringsAsFactors as false
r.df<-data.frame(t(apply(x.df[,-1], 1, cut_fn)), stringsAsFactors = FALSE)
#now loop across all of the columns creating an order factor list
#lowest=1 while highest =5
for(x in names(r.df)) {
r.df[[x]]<-factor(r.df[[x]], levels=c("lowest","low","middle","high","highest"), ordered=TRUE)}
}
Now each column will have all five levels arranged in the proper order.
Upvotes: 1