Reputation: 2788
I have a lot of units that are measured repeated times.
>df
Item value year
1 20 1990
1 20 1991
2 30 1990
2 15 1990
2 5 1991
3 10 1991
4 15 1990
5 10 1991
5 5 1991
I am trying to use dplyr
to remove values that have a low number of observations. On this toy data, lets say that I want to remove data which has fewer than 2 counts.
>df <- df %>%
group_by(Item) %>%
tally() %>%
filter(n>1)
Item n
1 2
2 3
5 2
The problem is that I would like to expand this back to what it was, but with this filter. I attempted using the ungroup
command, but that seems to only have an effect when grouping by two variables. How can I filter by item counts then get my original variables back i.e value
and year
. It should look like this:
>df
Item value year
1 20 1990
1 20 1991
2 30 1990
2 15 1990
2 5 1991
5 10 1991
5 5 1991
Upvotes: 14
Views: 22569
Reputation: 5424
More simply, use dplyr's row_number()
library(dplyr)
df <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)
df %>%
group_by(Item) %>%
filter(max(row_number()) > 1) %>%
ungroup()
# A tibble: 7 x 3
# Groups: Item [3]
Item value year
<int> <int> <int>
1 1 20 1990
2 1 20 1991
3 2 30 1990
4 2 15 1990
5 2 5 1991
6 5 10 1991
7 5 5 1991
Upvotes: 18