Reputation: 213
I have a df that looks like this:
I guess it will work some with dplyr and duplicates. Yet I don't know how to address multiple columns while distinguishing between a grouped variable.
from to group
1 2 metro
2 4 metro
3 4 metro
4 5 train
6 1 train
8 7 train
I want to find the ids
which exist in more than one group
variable.
The expected result for the sample df
is: 1
and 4
. Because they exist in the metro and the train group.
Thank you in advance!
Upvotes: 1
Views: 1208
Reputation: 28705
Convert data to long format and count unique values, using data.table
. melt
is used to convert to long format, and data table allows filtering in the i
part of df1[ i, j, k]
, grouping in the k
part, and pull
ing in the j
part.
library(data.table)
library(magrittr)
setDT(df1)
melt(df1, 'group') %>%
.[, .(n = uniqueN(group)), value] %>%
.[n > 1, unique(value)]
# [1] 1 4
Upvotes: 1
Reputation: 887651
We gather
the 'from', 'to' columns to 'long' format, grouped by 'val', filter
the groups having more than one unique elements, then pull
the unique 'val' elements
library(dplyr)
library(tidyr)
df1 %>%
gather(key, val, from:to) %>%
group_by(val) %>%
filter(n_distinct(group) > 1) %>%
distinct(val) %>%
pull(val)
#[1] 1 4
Or using base R
we can just table
to find the frequency, and get the ids out of it
out <- with(df1, colSums(table(rep(group, 2), unlist(df1[1:2])) > 0)) > 1
names(which(out))
#[1] "1" "4"
df1 <- structure(list(from = c(1L, 2L, 3L, 4L, 6L, 8L), to = c(2L, 4L,
4L, 5L, 1L, 7L), group = c("metro", "metro", "metro", "train",
"train", "train")), class = "data.frame", row.names = c(NA, -6L
))
Upvotes: 1
Reputation: 389175
Using base R we can split
the first two columns based on group
and find the intersecting value between the groups using intersect
Reduce(intersect, split(unlist(df[1:2]), df$group))
#[1] 1 4
Upvotes: 4