Reputation: 10131
Here's an example df:
df <- structure(list(x = 1:30, y = 101:130, g = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("x", "y", "g"), row.names = c(NA, -30L), class = "data.frame")
I would like to get the 10 lowest values of y for each group within the filtered data.
But
df2 <- df %>% filter(x>3) %>% group_by(g) %>% tail(y, n=10)
only returns the rows for the last group (C in this case):
Source: local data frame [10 x 3]
Groups: g
x y g
18 21 121 C
19 22 122 C
20 23 123 C
21 24 124 C
22 25 125 C
23 26 126 C
24 27 127 C
25 28 128 C
26 29 129 C
27 30 130 C
What am I doing wrong?
Upvotes: 14
Views: 10677
Reputation: 70316
Here are two other options:
df %>% filter(x>3) %>% group_by(g) %>% top_n(3, desc(y))
Here we make use of top_n
but use desc(y)
since we want the lowest y
values instead of the largest ("top") y
values.
df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% filter(1:n() <= 10)
which is equal to
df %>% filter(x>3) %>% group_by(g) %>% arrange(y) %>% slice(1:10)
After the grouping, we sort each group by increasing y
and then select the first 10 rows per group (or less if there are not 10 rows in a group).
Since there was some confusion about lowest and last values to be selected: this answer selects the lowest values, not the last entries.
Upvotes: 7
Reputation: 49448
More compact syntax via data.table
:
library(data.table)
dt = as.data.table(df)
# original tail question
dt[x > 3, tail(y, 10), by = g]
# 10 smallest values of y
# many options for this, here's one:
dt[x > 3, head(sort(y), 10), by = g]
# here's another, trying to take advantage of setkey speed
setkey(dt, g, y)
dt[x > 3, head(y, 10), by = g]
Upvotes: 2
Reputation: 36084
You can use tail
inside do
.
df2 <- df %>% filter(x>3) %>% group_by(g) %>% do(tail(., n=10))
The use of .
is key for this to work. From the do
help page: "You can use . to refer to the current group."
Edit:
As @beginneR pointed out, I was focusing on how to use tail
in groups with dplyr
and missed the part of the question where the OP asked for the 10 lowest values of y
. To do this correctly would take the addition of arrange
. With tail
, this would mean arranging by descending order of y
.
df2 <- df %>% filter(x>3) %>% group_by(g) %>% arrange(desc(y)) %>% do(tail(., n=10))
Upvotes: 25
Reputation: 2950
moar answers! This is a good job for one of the "window functions" that calculates ranks.
df %>%
filter(x > 3) %>%
group_by(g) %>%
filter(y %>% min_rank <= 10)
Upvotes: 2
Reputation: 94237
Why should tail work with a grouped data frame? It doesn't know about groups.
Add an order column, select by that column, drop that column. Here I use 3 instead of 10 for compactness (and %.% because old version of dplyr):
> df %.% filter(x>3) %.% group_by(g) %.% mutate(i=order(y)) %.% filter(i <= 3) %.% select(-matches("i"))
Source: local data frame [9 x 3]
Groups: g
x y g
1 4 104 A
2 5 105 A
3 6 106 A
4 11 111 B
5 12 112 B
6 13 113 B
7 21 121 C
8 22 122 C
9 23 123 C
Upvotes: 1
Reputation: 173677
I don't know why tail
isn't working, but you might try this:
df %>%
filter(x > 3) %>%
group_by(g) %>%
filter(.,rank(desc(y),ties.method = "min") <= 10)
Upvotes: 2