Mark
Mark

Reputation: 1769

Group rows of data frame and work on them

I have the following data frame:

mydata=data.frame(class = c("class 1", "class 2", "class 1", "class 2", "class 3",
                            "class 3", "class 1", "class 2", "class 1", "class 2",
                            "class 3", "class 1", "class 2", "class 3", "class 3"), 
                  word=c("A","B","A","C","A","B","A","C","D", "F", "G", "A", "U",
                         "A", "U"), weight=c(0.1,0.2,0.25,0.01,0.19,0.27,0.32,
                                             0.04,0.005,0.111,0.56,0.056,0.08,
                                             0.099,0.2345))

For each class, i.e. class 1, class 2, class 3, I would like to delect, from mydata, the first three words (see column word) with higher weights (see column weight). For my simple example, a code that realizes this task is:

x=table(mydata$class)
class=names(x)
y=mydata[mydata$class==class[1],]
y=y[order(y$weight,decreasing=TRUE),]
y=head(y,3)

z=mydata[mydata$class==class[2],]
z=z[order(z$weight,decreasing=TRUE),]
z=head(z,3)

w=mydata[mydata$class==class[3],]
w=w[order(w$weight,decreasing=TRUE),]
w=head(w,3)

And then, the desired result is:

> y$word
[1] A A A
Levels: A B C D F G U
> z$word
[1] B F U
Levels: A B C D F G U
> w$word
[1] G B U
Levels: A B C D F G U

If I had more classes, I would use for loop storing the words in a list. But ... Is there a more simple way to obtain this result?

Upvotes: 1

Views: 44

Answers (3)

Ric S
Ric S

Reputation: 9247

You can try using dplyr

library(dplyr)

mydata %>% 
  group_by(class) %>% 
  arrange(class, desc(weight)) %>% 
  slice(1:3)

#   class   word  weight
#   <fct>   <fct>  <dbl>
# 1 class 1 A      0.32 
# 2 class 1 A      0.25 
# 3 class 1 A      0.1  
# 4 class 2 B      0.2  
# 5 class 2 F      0.111
# 6 class 2 U      0.08 
# 7 class 3 G      0.56 
# 8 class 3 B      0.27 
# 9 class 3 U      0.234

Edit: the slice function selects the first 3 rows (in this case) for each group in the grouped dataframe. Unlike top_n, slice discards possible ties.

Upvotes: 2

JDG
JDG

Reputation: 1364

Using data.table:

setDT(mydata)
dt = mydata[, lapply(.SD, function(x) x[order(weight, decreasing = T)][1:3]), keyby = class]

> dt
     class word weight
1: class 1    A 0.3200
2: class 1    A 0.2500
3: class 1    A 0.1000
4: class 2    B 0.2000
5: class 2    F 0.1110
6: class 2    U 0.0800
7: class 3    G 0.5600
8: class 3    B 0.2700
9: class 3    U 0.2345

Upvotes: 2

Ian Campbell
Ian Campbell

Reputation: 24790

Here is a base R approach with similar output to your request.

lapply(split(mydata,mydata$class),function(x){x[order(x$weight,decreasing=TRUE),"word"][1:3]})
$`class 1`
[1] A A A
Levels: A B C D F G U

$`class 2`
[1] B F U
Levels: A B C D F G U

$`class 3`
[1] G B U
Levels: A B C D F G U

Upvotes: 1

Related Questions