Reputation: 1769
I have the following data frame:
mydata=data.frame(class = c("class 1", "class 2", "class 1", "class 2", "class 3",
"class 3", "class 1", "class 2", "class 1", "class 2",
"class 3", "class 1", "class 2", "class 3", "class 3"),
word=c("A","B","A","C","A","B","A","C","D", "F", "G", "A", "U",
"A", "U"), weight=c(0.1,0.2,0.25,0.01,0.19,0.27,0.32,
0.04,0.005,0.111,0.56,0.056,0.08,
0.099,0.2345))
For each class, i.e. class 1
, class 2
, class 3
, I would like to delect, from mydata
, the first three words (see column word
) with higher weights (see column weight
). For my simple example, a code that realizes this task is:
x=table(mydata$class)
class=names(x)
y=mydata[mydata$class==class[1],]
y=y[order(y$weight,decreasing=TRUE),]
y=head(y,3)
z=mydata[mydata$class==class[2],]
z=z[order(z$weight,decreasing=TRUE),]
z=head(z,3)
w=mydata[mydata$class==class[3],]
w=w[order(w$weight,decreasing=TRUE),]
w=head(w,3)
And then, the desired result is:
> y$word
[1] A A A
Levels: A B C D F G U
> z$word
[1] B F U
Levels: A B C D F G U
> w$word
[1] G B U
Levels: A B C D F G U
If I had more classes, I would use for loop storing the words in a list. But ... Is there a more simple way to obtain this result?
Upvotes: 1
Views: 44
Reputation: 9247
You can try using dplyr
library(dplyr)
mydata %>%
group_by(class) %>%
arrange(class, desc(weight)) %>%
slice(1:3)
# class word weight
# <fct> <fct> <dbl>
# 1 class 1 A 0.32
# 2 class 1 A 0.25
# 3 class 1 A 0.1
# 4 class 2 B 0.2
# 5 class 2 F 0.111
# 6 class 2 U 0.08
# 7 class 3 G 0.56
# 8 class 3 B 0.27
# 9 class 3 U 0.234
Edit: the slice
function selects the first 3 rows (in this case) for each group in the grouped dataframe. Unlike top_n
, slice
discards possible ties.
Upvotes: 2
Reputation: 1364
Using data.table
:
setDT(mydata)
dt = mydata[, lapply(.SD, function(x) x[order(weight, decreasing = T)][1:3]), keyby = class]
> dt
class word weight
1: class 1 A 0.3200
2: class 1 A 0.2500
3: class 1 A 0.1000
4: class 2 B 0.2000
5: class 2 F 0.1110
6: class 2 U 0.0800
7: class 3 G 0.5600
8: class 3 B 0.2700
9: class 3 U 0.2345
Upvotes: 2
Reputation: 24790
Here is a base R approach with similar output to your request.
lapply(split(mydata,mydata$class),function(x){x[order(x$weight,decreasing=TRUE),"word"][1:3]})
$`class 1`
[1] A A A
Levels: A B C D F G U
$`class 2`
[1] B F U
Levels: A B C D F G U
$`class 3`
[1] G B U
Levels: A B C D F G U
Upvotes: 1