Reputation: 469
I have a data table with two columns. For each column, I want to count the number of rows with the same value for row 1 and a row 2 value that is +/- 10 of it's value for column 2. Here is an example of what some data and the result may look like:
Table: info
C1 C2 near
a 5 0
a 25 1
a 27 1
b 8 1
b 12 2
b 20 1
c 10 0
I've written an ugly for loop that does this calculation (see below), but I'm wondering if anyone has a more efficient solution that would work better on 100k+ row tables.
for (f in 1:5) {
n <- info$C2[f]
info$near[f] <- nrow(subset(info, info$C1 == info$C1[f] & info$C2 >= n-10 & info$C2 <= n+10))-1
}
Thanks!
Upvotes: 0
Views: 733
Reputation: 149
I don't know if I quit understand your question, by it seem to me that you can achieve good performance with sqldf library. It isn't the smartest way. But it will work well.
If you want look this page https://www.google.com.br/amp/s/www.r-bloggers.com/make-r-speak-sql-with-sqldf/amp/
Upvotes: 0
Reputation: 11728
A more general version of the solution provided by @Gregor:
library(dplyr)
info %>%
group_by(C1) %>%
mutate(near = colSums(abs(outer(C2, C2, "-")) <= 10) - 1)
Upvotes: 0
Reputation: 145755
With dplyr
:
library(dplyr)
info %>% group_by(C1) %>% mutate(near = abs(diff(C2)) < 10)
This assumes that each value of C1
has 2 rows.
Upvotes: 2