Reputation: 281
I have a data frame, nearest_neighbour, which lists the nearest neighbours of a point. So for point 1, the 1st nearest neighbour is point 2, the second nearest neighbour is point 3, and so on.
What is the quickest way to loop through this and check if 4 points all share the same nearest neighbours? Eg. Point 1's three nearest neighbours are 2, 3 and 4. Point 2's nearest neighbours are 1, 3 and 4 etc.
which.1 which.2 which.3
1 2 3 4
2 1 4 3
3 1 4 2
4 3 1 2
5 2 4 6
6 7 5 2
I can do it easily with if statements for just two neighbours:
count <- 0
for (j in 1:length(nearest_neighbour[[1]])){
if(nearest_neighbour[[1]][nearest_neighbour[[1]][j]] == j){
count <- count + 1
}
}
However this method seems silly for more than 2 as there ends up being a lot of if statements.
Upvotes: 0
Views: 97
Reputation: 38500
Here is a base R method using factor
and apply
groups <- factor(apply(cbind(df, seq_len(nrow(df))), 1,
function(i) paste(sort(i), collapse="_")))
groups
1 2 3 4 5 6
1_2_3_4 1_2_3_4 1_2_3_4 1_2_3_4 2_4_5_6 2_5_6_7
Levels: 1_2_3_4 2_4_5_6 2_5_6_7
The inner function sorts a vector and collapses the result into a string separated by underscores. This function is applied to each row of a modified version of the data frame where the current row number (element ID) is added.
Upvotes: 1
Reputation: 10352
Here is also a base R solution, but with a different approach:
dd <- t(apply(df, 1, function(x) table(factor(x, levels=1:max(df)))))
colSums(dd) >= 4
1 2 3 4 5 6 7
FALSE TRUE FALSE TRUE FALSE FALSE FALSE
So points 2 and 4 appear more (or equal) then 4 times.
Upvotes: 0