Reputation: 83
I wanted to delete rows in x1 column that don't appear in EVERY month in another column: The dataset is as follows:
id month
1 01
2 01
3 01
1 02
2 02
1 03
2 03
I want to delete id = 3 from the dataset, since it doesn't appear in month = 02
Im using R
Thank you for helping
Upvotes: 1
Views: 100
Reputation: 886938
Using dplyr
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(month) == n_distinct(df$month)) %>%
ungroup
-output
# A tibble: 6 × 2
id month
<int> <int>
1 1 1
2 2 1
3 1 2
4 2 2
5 1 3
6 2 3
Or using data.table
library(data.table)
data_hh[, if(uniqueN(month) == uniqueN(.SD$month)) .SD, .(id)]
data_hh <- structure(list(id = c(18354L, 18815L, 19014L, 63960L, 72996L,
73930L), month = c(1, 1, 1, 1, 1, 1), value = c(113.33, 251.19,
160.15, 278.8, 254.39, 733.22), x1 = c(96.75, 186.78, 106.02,
195.23, 184.57, 473.92), x2 = c(1799.1, 5399.1, 1799.1, 1349.1,
2924.1, 2024.1), x3 = c(85.37, 74.36, 66.2, 70.02, 72.55, 64.63
), x4 = c(6.29, 4.65, 8.9, 20.66, 8.69, 36.22)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"))
Upvotes: 0
Reputation: 51582
You can split the dataset and use Reduce
, i.e.
remove <- Reduce(setdiff, split(df$id, df$month))
df[!df$id %in% remove,]
id month
1 1 1
2 2 1
4 1 2
5 2 2
6 1 3
7 2 3
As @jay.sf mentioned, you need to assign it back to your dataframe,
df <- df[!df$id %in% remove,]
Upvotes: 2