Frank B.
Frank B.

Reputation: 1883

Identifying Overlapping Numeric Ranges in R

I want to create a niche network of IDs that overlapped with each other.

df <- 
  data.frame(
    id = 1:5, 
    start_year = c(2010, 2010, 2011, 2013, 2014), 
    end_year = c(2014, 2012, 2018, 2015, 2020))

  id start_year end_year
1  1       2010     2014
2  2       2010     2012
3  3       2011     2018
4  4       2013     2015
5  5       2014     2020

It needs to be a pairwise comparison, which is the part that I can't figure out. For any x <-> y comparison it would look like this:

1 & 2 overlapped 3 years (2010, 2011, 2012)
1 & 3 overlapped 4 years (2011, 2012, 2013, 2014)
1 & 4 overlapped 2 years (2013, 2014)
etc

All I care about for the above 3 examples is getting it to triplicate:

1, 2, 3
1, 3, 4
1, 4, 2
etc 

TIA

Upvotes: 0

Views: 35

Answers (1)

Mohanasundaram
Mohanasundaram

Reputation: 2949

You can make use of combn to get different combinations of the rows and apply to get the overlap

result <- as.data.frame(t(apply(combn(nrow(df), 2), 2, 
                            function(x) c(id_1 = x[1],
                                          id_2 = x[2],
                                          overlap = sum(df[x[1],2]:df[x[1],3] %in% df[x[2],2]:df[x[2],3])))))

result

   id_1 id_2 overlap
1     1    2       3
2     1    3       4
3     1    4       2
4     1    5       1
5     2    3       2
6     2    4       0
7     2    5       0
8     3    4       3
9     3    5       5
10    4    5       2

Upvotes: 1

Related Questions