Reputation: 165
I have four ID columns in my table (called MGIT): Ext_ID_1 through _4. Sometimes the same number is present in different rows in them and I need to filter these rows for further analysis, example:
Starting state:
Ext_ID Ext_ID_4
1111 2222
3333 4444
5555 1111
6666 7777
8888 9999
9999 1010
Desired filtered outcome:
Ext_ID Ext_ID_4
1111 2222
5555 1111
8888 9999
9999 1010
I would only like to compare them two at a time for clearer management.
Through previous questions in StackOverflow I have found the following code:
Dupl = MGIT[,c('Ext_ID','Ext_ID_4')]
Result <- MGIT[duplicated(Dupl) | duplicated(Dupl, fromLast=TRUE),]
But it's giving back duplicated only values inside the columns (actually, only inside Ext_ID, but I believe ID_4 doesn't have duplicated values within itself).
I am a beginner at programming and know nothing about the language.
Upvotes: 3
Views: 819
Reputation: 3212
I assume you want this to work for all your four columns, and not just the two in your example.
Here is a solution that will work no matter how many "Ext_ID" columns you have:
library(dplyr)
# Recreate the data from your example
df <- tibble::tribble(
~Ext_ID, ~Ext_ID_4,
1111, 2222,
3333, 4444,
5555, 1111,
6666, 7777,
8888, 9999,
9999, 1010
)
# The actual code you need - just replace `df` with the name of your table
df %>%
filter(
if_any(
starts_with("Ext_ID"),
~ .x %>% purrr::map_lgl(~sum(df == .x) > 1)
)
)
# The output:
#> # A tibble: 4 x 2
#> Ext_ID Ext_ID_4
#> <dbl> <dbl>
#> 1 1111 2222
#> 2 5555 1111
#> 3 8888 9999
#> 4 9999 1010
Explaination:
~ .x %>% purrr::map_lgl(~sum(df == .x) > 1)
checks whether each value in a column shows up more than once in the dataframe.starts_with("Ext_ID")
makes sure that this is done for all columns starting with "Ext_ID"if_any()
makes filter()
keep rows where there are at least 1 duplicate value in the df for any of the columns.Upvotes: 1
Reputation: 52004
A base R solution:
#Select whatever column you want
cols <- unlist(dat[, c('Ext_ID','Ext_ID_4')])
#Get the values that appear at least twice
uni <- unique(cols[ave(cols, cols, FUN=length) > 1])
#Filter if any of the row values match with uni
dat[apply(apply(dat, 1, function(x) x %in% uni), 2, any), ]
Ext_ID Ext_ID_4
1 1111 2222
3 5555 1111
5 8888 9999
6 9999 1010
Upvotes: 0