Reputation: 3195
In my dataset num
num
structure(list(x1 = c(52L, 74L, 61L, 63L, 44L), x2 = c(32L, 96L,
83L, 35L, 95L), x3 = c(9L, 36L, 7L, 33L, 67L), x4 = c(1L, 2L,
3L, 2L, 3L), x5 = c(2017L, 2017L, 2017L, 2018L, 2018L)), .Names = c("x1",
"x2", "x3", "x4", "x5"), class = "data.frame", row.names = c(NA,
-5L))
there is variables x4(number) and x5(year). The problem: values of x4 variable for 2017 does not match with the values of x4 var for 2018 year. For example, in my reproducible example we can see that variable x4 for 2017 has number 1, but there is no number 1 for 2018. So we must delete number 1 from dataset(i.e delete row with it). As well as, other values of x4 that exist for 2017, but not for 2018. How do that?
Indeed 2017 year has 406 numbers, but in 2018 there are 1500 numbers in original dataset. it is needed the code where numbers of 2017 and 2018 years are identical , but after 406 to 1500 ,2018 has numbers?
structure(list(x1 = c(52L, 74L, 61L, 63L, 44L, 44L), x2 = c(32L,
96L, 83L, 35L, 95L, 95L), x3 = c(9L, 36L, 7L, 33L, 67L, 67L),
x4 = c(1L, 2L, 3L, 2L, 3L, 1500L), x5 = c(2017L, 2017L, 2017L,
2018L, 2018L, 2018L)), .Names = c("x1", "x2", "x3", "x4",
"x5"), class = "data.frame", row.names = c(NA, -6L))
in output
x1 x2 x3 x4 x5
74 96 36 2 2017
61 83 7 3 2017
63 35 33 2 2018
44 95 67 3 2018
44 95 67 1500 2018
Upvotes: 0
Views: 75
Reputation: 887108
Here is one option with tidyverse
library(dplyr)
nums %>%
group_by(x4) %>%
filter(n_distinct(x5) == n_distinct(.$x5))
# A tibble: 4 x 5
# Groups: x4 [2]
# x1 x2 x3 x4 x5
# <int> <int> <int> <int> <int>
#1 74 96 36 2 2017
#2 61 83 7 3 2017
#3 63 35 33 2 2018
#4 44 95 67 3 2018
Upvotes: 1
Reputation: 50668
You can do the following:
# Get x4 values present for all years
x4.all <- Reduce(
function(a, b) intersect(a, b),
lapply(split(num, num$x5), function(x) x$x4))
# Select entries where x4 is an element of x4.all
subset(num, x4 %in% x4.all)
#x1 x2 x3 x4 x5
#2 74 96 36 2 2017
#3 61 83 7 3 2017
#4 63 35 33 2 2018
#5 44 95 67 3 2018
Explanation: We use Reduce(function(a, b) intersect(a, b), ...)
to calculate the intersect of id4
values across all x5
(years) groups; we then filter entries based on those x4.all
values that are present in all years.
Upvotes: 3
Reputation: 5893
You could e.g. first retrieve those numbers that are present for every year, then index with them.
inds <- do.call(intersect, unname(by(num, num$x5, function(x) x$x4)))
num[num$x4 %in% inds, ]
x1 x2 x3 x4 x5
2 74 96 36 2 2017
3 61 83 7 3 2017
4 63 35 33 2 2018
5 44 95 67 3 2018
Upvotes: 1