psysky
psysky

Reputation: 3195

How to delete rows with different values for variables

In my dataset num

num
structure(list(x1 = c(52L, 74L, 61L, 63L, 44L), x2 = c(32L, 96L, 
83L, 35L, 95L), x3 = c(9L, 36L, 7L, 33L, 67L), x4 = c(1L, 2L, 
3L, 2L, 3L), x5 = c(2017L, 2017L, 2017L, 2018L, 2018L)), .Names = c("x1", 
"x2", "x3", "x4", "x5"), class = "data.frame", row.names = c(NA, 
-5L))

there is variables x4(number) and x5(year). The problem: values of x4 variable for 2017 does not match with the values of x4 var for 2018 year. For example, in my reproducible example we can see that variable x4 for 2017 has number 1, but there is no number 1 for 2018. So we must delete number 1 from dataset(i.e delete row with it). As well as, other values of x4 that exist for 2017, but not for 2018. How do that?

update

Indeed 2017 year has 406 numbers, but in 2018 there are 1500 numbers in original dataset. it is needed the code where numbers of 2017 and 2018 years are identical , but after 406 to 1500 ,2018 has numbers?

dput with 1500 number

structure(list(x1 = c(52L, 74L, 61L, 63L, 44L, 44L), x2 = c(32L, 
96L, 83L, 35L, 95L, 95L), x3 = c(9L, 36L, 7L, 33L, 67L, 67L), 
    x4 = c(1L, 2L, 3L, 2L, 3L, 1500L), x5 = c(2017L, 2017L, 2017L, 
    2018L, 2018L, 2018L)), .Names = c("x1", "x2", "x3", "x4", 
"x5"), class = "data.frame", row.names = c(NA, -6L))

in output

x1  x2  x3  x4       x5
74  96  36  2       2017
61  83  7   3       2017
63  35  33  2       2018
44  95  67  3       2018
44  95  67  1500    2018

Upvotes: 0

Views: 75

Answers (3)

akrun
akrun

Reputation: 887108

Here is one option with tidyverse

library(dplyr)
nums %>%
   group_by(x4) %>% 
   filter(n_distinct(x5)  == n_distinct(.$x5))
# A tibble: 4 x 5
# Groups: x4 [2]
#     x1    x2    x3    x4    x5
#  <int> <int> <int> <int> <int>
#1    74    96    36     2  2017
#2    61    83     7     3  2017
#3    63    35    33     2  2018
#4    44    95    67     3  2018

Upvotes: 1

Maurits Evers
Maurits Evers

Reputation: 50668

You can do the following:

# Get x4 values present for all years
x4.all <- Reduce(
   function(a, b) intersect(a, b),
   lapply(split(num, num$x5), function(x) x$x4))

# Select entries where x4 is an element of x4.all
subset(num, x4 %in% x4.all)
#x1 x2 x3 x4   x5
#2 74 96 36  2 2017
#3 61 83  7  3 2017
#4 63 35 33  2 2018
#5 44 95 67  3 2018

Explanation: We use Reduce(function(a, b) intersect(a, b), ...) to calculate the intersect of id4 values across all x5 (years) groups; we then filter entries based on those x4.all values that are present in all years.

Upvotes: 3

erocoar
erocoar

Reputation: 5893

You could e.g. first retrieve those numbers that are present for every year, then index with them.

inds <- do.call(intersect, unname(by(num, num$x5, function(x) x$x4)))
num[num$x4 %in% inds, ]

  x1 x2 x3 x4   x5
2 74 96 36  2 2017
3 61 83  7  3 2017
4 63 35 33  2 2018
5 44 95 67  3 2018

Upvotes: 1

Related Questions