Stataq
Stataq

Reputation: 2301

How to filter a database on other data

I have a dataset df that looks like this:

enter image description here

and another two datasets that look like this:

Dataset A:

enter image description here

Dataset B:

enter image description here

I want to build a data C<-df %>% keep (name(df) %in% A$Data) and then check if B$Data show up in C or not. what should I do?

Sample data can be get from:

df<-structure(list(cy3.CSV = c("cy3.CSV", NA, NA), cy6.CSV = c("cy1.CSV", 
"cy24.CSV", "cy6.CSV"), dlt.CSV = c("dlt.CSV", NA, NA), dm.CSV = c("dm.CSV", 
NA, NA), dov.CSV = c("dov.CSV", "dov_1.CSV", NA), dov_1.CSV = c("dov_1.CSV", 
NA, NA), ds.CSV = c("ds.CSV", "ds_1.CSV", NA)), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

A<-structure(list(Data = c("cy6.CSV", "dov.CSV", "ds.CSV"), match = c(3, 
2, 2)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

B<-structure(list(Data = c("cy3.CSV", "dlt.CSV", "dm.CSV", "dov_1.CSV"
), match = c(1, 1, 1, 1)), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"))

Thanks.

enter image description here

Upvotes: 0

Views: 60

Answers (2)

akrun
akrun

Reputation: 887088

We can place the dataset in a list and then do it in a loop

library(dplyr)
library(purrr)
map(list(A, B), ~ .x %>%
           mutate(in_df = Data %in% union(colnames(df), unlist(df))))

-output

#[[1]]
# A tibble: 3 x 3
#  Data    match in_df
#  <chr>   <dbl> <lgl>
#1 cy6.CSV     3 TRUE 
#2 dov.CSV     2 TRUE 
#3 ds.CSV      2 TRUE 

#[[2]]
# A tibble: 4 x 3
#  Data      match in_df
#  <chr>     <dbl> <lgl>
#1 cy3.CSV       1 TRUE 
#2 dlt.CSV       1 TRUE 
#3 dm.CSV        1 TRUE 
#4 dov_1.CSV     1 TRUE 

Based on the comments, may be this helps

C <- df %>%
        select_if(~ any(. %in% A$Data))

# dplyr version >= 1.0
C %>% 
   mutate(across(everything(), ~ replace(., is.na(.), intersect(B$Data, .)[1])))
# A tibble: 3 x 3
#  cy6.CSV  dov.CSV   ds.CSV  
#  <chr>    <chr>     <chr>   
#1 cy1.CSV  dov.CSV   ds.CSV  
#2 cy24.CSV dov_1.CSV ds_1.CSV
#3 cy6.CSV  dov_1.CSV <NA>    

# dplyr version < 1.0

C %>% 
   mutate_all( ~ replace(., is.na(.), intersect(B$Data, .)[1]))

Upvotes: 3

Vincent
Vincent

Reputation: 17715

Convert your df to a vector, making sure to preserve column names. Then, check for the condition using %in%:

m <- c(colnames(df), unlist(df))

B$in_df <- B$Data %in% m
A$in_df <- A$Data %in% m

A
#>      Data match in_df
#> 1 cy6.CSV     3  TRUE
#> 2 dov.CSV     2  TRUE
#> 3  ds.CSV     2  TRUE

B
#>        Data match in_df
#> 1   cy3.CSV     1  TRUE
#> 2   dlt.CSV     1  TRUE
#> 3    dm.CSV     1  TRUE
#> 4 dov_1.CSV     1  TRUE

Upvotes: 3

Related Questions