Join rows in a data frame which have similar (but not equal) values

Question

I have a df like:

   SampleID Chr Start End    Strand  Value
1:   rep1     1 11001 12000     -     10
2:   rep1     1 15000 20100     -     5
3:   rep2     1 11070 12050     -     1
4:   rep3     1 14950 20090     +     20
...

And I want to join the rows that share the same chr and strand and that have similar starting and end points (say like with 100 +/- distance). For those columns that the row join is performed, I would also like to concatenate the SampleID names and the Value. With the previous example, something like:

   SampleID Chr Start End    Strand  Value
1:rep1,rep2   1 11001 12000     -     10,1
2:   rep1     1 15000 20100     -     5
4:   rep3     1 14950 20090     +     20
...

Ideas? Thanks!

EDIT:

I found the fuzzyjoin package for R (https://cran.r-project.org/web/packages/fuzzyjoin/index.html). Does anyone have experience with this package?

EDIT2:

It would be also nice if just one of the variables (SampleID or Value) would be concatenated.

Join rows in a data frame which have similar (but not equal) values

Answers (1)

Related Questions