Reputation: 391
I have a dataset like this -
tibble(
year = c (2005, 2005, 2005,
2006, 2006, 2006,
2007,2007,2007),
var1 = c (2136,2136,2136,
2285,2285,2285,
2968,2968,2968),
var2 = c (125860,1318,127464,
8972,2136,5046,
4739,1619,10063)
)
I want to filter the dataset like this -
first group by the data by variables year
and var1
, then compare columns var1
and var2
,
if it is found that the values in columns var1
and var2
match, then I want to filter all of them.
For the example dataset, if we look, we can see in var1
2136 in 2005 matches with 2136 in var2
in 2006. therefore, I would like to filter both 2005 and 2006 rows because in both columns var1
and var2
, they have a common value - 2136.
So the expected resulted will look like this -
tibble(
year = c (2005, 2005, 2005,
2006, 2006, 2006
),
var1 = c (2136,2136,2136,
2285,2285,2285),
var2 = c (125860,1318,127464,
8972,2136,5046)
)
Can anybody help?
Upvotes: 3
Views: 763
Reputation: 21908
This may sound verbose but I think it's quite reliable:
library(dplyr)
library(purrr)
df %>%
filter(year %in%
(combn(unique(df$year), 2) %>%
t() %>%
as.data.frame() %>%
mutate(common = pmap_lgl(., ~ length(intersect(df$var1[df$year == ..1], df$var2[df$year == ..2])) > 0)) %>%
filter(common) %>%
select(-common) %>%
unlist() %>%
unique()))
# A tibble: 6 x 3
year var1 var2
<dbl> <dbl> <dbl>
1 2005 2136 125860
2 2005 2136 1318
3 2005 2136 127464
4 2006 2285 8972
5 2006 2285 2136
6 2006 2285 5046
Upvotes: 1
Reputation: 26218
Rather easy in tidyverse. I am using purrr::map2_lgl()
apart from usual dplyr functions here.
library(tidyverse)
df <- tibble(
year = c (2005, 2005, 2005,
2006, 2006, 2006,
2007,2007,2007),
var1 = c (2136,2136,2136,
2285,2285,2285,
2968,2968,2968),
var2 = c (125860,1318,127464,
8972,2136,5046,
4739,1619,10063)
)
df %>%
mutate(xx = map2_lgl(var1, var2, ~ .x %in% var2 | .y %in% var1)) %>%
group_by(year, var1) %>%
filter(any(xx))
#> # A tibble: 6 x 4
#> # Groups: year, var1 [2]
#> year var1 var2 xx
#> <dbl> <dbl> <dbl> <lgl>
#> 1 2005 2136 125860 TRUE
#> 2 2005 2136 1318 TRUE
#> 3 2005 2136 127464 TRUE
#> 4 2006 2285 8972 FALSE
#> 5 2006 2285 2136 TRUE
#> 6 2006 2285 5046 FALSE
Created on 2021-06-23 by the reprex package (v2.0.0)
You may remove xx
after ungroup, if you want. I kept this for explaining things.
Upvotes: 1
Reputation: 160427
A non-dplyr
suggestion:
uniqyears <- unique(df1$year)
keepyears <- sapply(uniqyears,
function(yr) any(df1$var1[df1$year == yr] %in% df1$var2[df1$year != yr] |
df1$var2[df1$year == yr] %in% df1$var1[df1$year != yr]))
keepyears
# [1] TRUE TRUE FALSE
filter(df1, year %in% uniqyears[keepyears])
# # A tibble: 6 x 3
# year var1 var2
# <dbl> <dbl> <dbl>
# 1 2005 2136 125860
# 2 2005 2136 1318
# 3 2005 2136 127464
# 4 2006 2285 8972
# 5 2006 2285 2136
# 6 2006 2285 5046
Upvotes: 0