Reputation: 363
I've got a dataframe with several (numeric) columns, and want to make a new dataframe whose columns are the ranks of the originals.
> df <- data.frame(cbind(id=LETTERS[1:10],
wheat=c(123,234,345,456,678,987,876,654,432,321),barley=c(135,975,246,864,357,753,468,642,579,531)))
> df
id wheat barley
1 A 123 135
2 B 234 975
3 C 345 246
4 D 456 864
5 E 678 357
6 F 987 753
7 G 876 468
8 H 654 642
9 I 432 579
10 J 321 531
> rankeddf <- transform(df, wheat=rank(wheat), barley=rank(barley))
> rankeddf
id wheat barley
1 A 1 1
2 B 2 10
3 C 4 2
4 D 6 9
5 E 8 3
6 F 10 8
7 G 9 4
8 H 7 7
9 I 5 6
10 J 3 5
The thing is, the number and names of the columns vary. I have a vector that specifies them:
cols <- c("wheat", "barley")
How can I construct the transform
statement on the fly? Or even loop through the cols
vector, applying a transform
statement once on each iteration? I'm guessing the answer is going to have something to do with eval
or evalq
, but I haven't quite got my head around them yet. For instance,
> rankeddf2 <- df
> for (col in cols) {rankeddf2 <- transform(rankeddf2, evalq(paste(col,"=rank(",col,")",sep="")))}
> rankeddf2
id wheat barley
1 A 123 135
2 B 234 975
3 C 345 246
4 D 456 864
5 E 678 357
6 F 987 753
7 G 876 468
8 H 654 642
9 I 432 579
10 J 321 531
doesn't do the trick.
Alternatively, is there another way of doing this?
Upvotes: 3
Views: 136
Reputation: 174813
I like to think of transform()
and the related with()
and within()
as syntactic sugar that are useful at the top-level interactively but quite often subsetting and replacement via '['()
, '[<-'()
et al are more easy to use for jobs such as this:
> df2 <- df ## copy
> df2[, cols] <- apply(df[, cols], 2, rank)
> df2
id wheat barley
1 A 1 1
2 B 2 10
3 C 4 2
4 D 6 9
5 E 8 3
6 F 10 8
7 G 9 4
8 H 7 7
9 I 5 6
10 J 3 5
'['()
and '[<-'()
already do what you want so you are trying to force transform()
do something that is already implemented much more easily with the subsetting and replacement functions.
Upvotes: 4
Reputation: 179428
You can do this by using lapply
and rank()
:
as.data.frame(lapply(df[, cols], rank))
wheat barley
1 1 1
2 2 10
3 4 2
4 6 9
5 8 3
6 10 8
7 9 4
8 7 7
9 5 6
10 3 5
OK, so in the process you lose the first column, but that's easy to add back:
data.frame(id=df[[1]], lapply(df[, cols], rank))
id wheat barley
1 A 1 1
2 B 2 10
3 C 4 2
4 D 6 9
5 E 8 3
6 F 10 8
7 G 9 4
8 H 7 7
9 I 5 6
10 J 3 5
Upvotes: 6