user6271572
user6271572

Reputation:

Comparing two tables in r

I have a table with reference positions, like x is the start and y is the end.

|---------------------|------------------|
|           x         |        y         |
|---------------------|------------------|
|          10         |         35       |
|---------------------|------------------|
|          58         |         89       |
|---------------------|------------------|

Then I have another table with single positions and my goal is to check if any of the positions in this second table are in the first table, considering that the positions in this second table can be in between the col1 and col2.

|---------------------|
|          12         |     
|---------------------|
|          27         |       
|---------------------|
|          65         |
|---------------------|

How can I check this, since I can't use any of the joins from dplyr, or even the unique.

Upvotes: 1

Views: 1050

Answers (3)

ypa y yhm
ypa y yhm

Reputation: 219

You can use the diffdf::diffdf :

#' you have: `df.a` `df.b`

#' simple diff by row number
df.a %>% diffdf::diffdf (df.b)

#' order then diff by row number
data.table::setorderv (df.a) %>% diffdf::diffdf (data.table::setorderv (df.b))

#' diff by key(s) you given
#' the concat of key1, key2, ... should be The Rowkey of both your dataframes.
df.a %>% diffdf::diffdf (df.b, key = c('key1','key2'))

Here is a tool to allow make a multi-rds-files diff (also it is a complex diffdf demo):

rdses.compare = 
\ (orderf = \ (a) a) 
\ (dirpath.a, dirpath.b) 
\ (keys) (\ (.sep) 
    dirpath.a %>% base::c (dirpath.b) %>% base::`names<-` (.,.) %>% 
        base::lapply (\ (p) p %>% base::list.files (full.names = T)) %>% 
        base::Reduce (\ (a,b) a %>% base::paste (b, sep = .sep), x = .) %>% 
        base::`names<-` (.,.) %>% 
        base::strsplit (.sep) %>% 
        future.apply::future_lapply (\ (x) x %>% 
            base::`names<-` (.,.) %>% 
            base::lapply (base::readRDS) %>% 
            base::lapply (orderf) %>% 
            base::Reduce (\ (a,b) a %>% 
                diffdf::diffdf (b, keys = keys), x = .) %>% 
            {.}) %>% 
        {.}) (" <> ") %>% 
    {.} ;

rdsdirs.compare = 
\ (orderf = data.table::setorderv) 
\ (path.a, path.b) 
\ (dir) (\ (`%rdses.compare%`) 
    path.a %>% base::c (path.b) %>% 
        file.path (dir) %>% 
        {.[1] %rdses.compare% .[2]}
    ) (rdses.compare (orderf)) ;

`%rdses.compare%` = rdses.compare (\ (a) a)
`%rdses.compare.ord%` = rdses.compare (data.table::setorderv)

`%rdsdirs.compare%` = rdsdirs.compare (\ (a) a)
`%rdsdirs.compare.ord%` = rdsdirs.compare (data.table::setorderv)
#' Parallel run setting
future::plan (future::multisession)

#' Compare two dir witch both have same count and name of RDS files
(dir.a %rdses.compare% dir.b) (keys) -> res

#' Compare two same dir at two different path: 
(path.a %rdsdirs.compare% path.b) ('player_one') (keys) -> res

#' `keys` can be `c('key1','key2',...)` or `NULL`
#' 

#' Then you can filt all no-issue report
res %<>% base::Filter (\ (i) base::length (i) > 0, x = .)

#' GC if you need
future::plan (future::sequential); base::gc ();

That tool need you make sure your rds files at both dir have same (or at least likely) names, and those dirs must only have rds files.

Upvotes: 0

Uwe
Uwe

Reputation: 42544

Version 1.9.8 of data.table (on CRAN 25 Nov 2016) introduced non-equi joins which can be used instead of foverlaps():

setDT(df1)[setDT(df2), on = .(x <= z, y >= z), which = TRUE]
[1]  1  1  2 NA

Note that the second table differs from OP's data as a fourth row has been added which doesn't match any of the intervals.

Data

df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(z = c(12, 27, 65, 90))

Upvotes: 0

akrun
akrun

Reputation: 887108

We can use foverlaps from data.table

library(data.table)
df1 <- data.frame(x = c(10, 58), y = c(35, 89))
df2 <- data.frame(x= c(12, 27, 65), y = c(12, 27, 65))
setDT(df1, key = c('x', 'y'))
setDT(df2, key = c('x', 'y'))
foverlaps(df2, df1, type = "within", which = TRUE)$yid 
#[1] 1 1 2

Upvotes: 3

Related Questions