bgrantham
bgrantham

Reputation: 281

R checking if the same numbers occur in multiple rows of a data frame

I have a data frame, nearest_neighbour, which lists the nearest neighbours of a point. So for point 1, the 1st nearest neighbour is point 2, the second nearest neighbour is point 3, and so on.

What is the quickest way to loop through this and check if 4 points all share the same nearest neighbours? Eg. Point 1's three nearest neighbours are 2, 3 and 4. Point 2's nearest neighbours are 1, 3 and 4 etc.

  which.1 which.2 which.3
1       2       3       4
2       1       4       3
3       1       4       2
4       3       1       2
5       2       4       6
6       7       5       2

I can do it easily with if statements for just two neighbours:

count <- 0
for (j in 1:length(nearest_neighbour[[1]])){
    if(nearest_neighbour[[1]][nearest_neighbour[[1]][j]] == j){
        count <- count + 1
    }
}

However this method seems silly for more than 2 as there ends up being a lot of if statements.

Upvotes: 0

Views: 97

Answers (2)

lmo
lmo

Reputation: 38500

Here is a base R method using factor and apply

groups <- factor(apply(cbind(df, seq_len(nrow(df))), 1,
                       function(i) paste(sort(i), collapse="_")))

groups
      1       2       3       4       5       6 
1_2_3_4 1_2_3_4 1_2_3_4 1_2_3_4 2_4_5_6 2_5_6_7 
Levels: 1_2_3_4 2_4_5_6 2_5_6_7

The inner function sorts a vector and collapses the result into a string separated by underscores. This function is applied to each row of a modified version of the data frame where the current row number (element ID) is added.

Upvotes: 1

J_F
J_F

Reputation: 10352

Here is also a base R solution, but with a different approach:

dd <- t(apply(df, 1, function(x) table(factor(x, levels=1:max(df)))))

colSums(dd) >= 4

    1     2     3     4     5     6     7 
FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE 

So points 2 and 4 appear more (or equal) then 4 times.

Upvotes: 0

Related Questions