Reputation: 469

R: Counting rows with similar values

I have a data table with two columns. For each column, I want to count the number of rows with the same value for row 1 and a row 2 value that is +/- 10 of it's value for column 2. Here is an example of what some data and the result may look like:

Table: info
C1     C2       near
a      5        0
a      25       1
a      27       1
b      8        1
b      12       2
b      20       1
c      10       0

I've written an ugly for loop that does this calculation (see below), but I'm wondering if anyone has a more efficient solution that would work better on 100k+ row tables.

for (f in 1:5) {
  n <- info$C2[f]
  info$near[f] <- nrow(subset(info, info$C1 == info$C1[f] & info$C2 >= n-10 & info$C2 <= n+10))-1
}

Thanks!

Upvotes: 0

Answers (3)

Alexandre Gentil

Reputation: 149

I don't know if I quit understand your question, by it seem to me that you can achieve good performance with sqldf library. It isn't the smartest way. But it will work well.

If you want look this page https://www.google.com.br/amp/s/www.r-bloggers.com/make-r-speak-sql-with-sqldf/amp/

Upvotes: 0

F. Privé

Reputation: 11728

A more general version of the solution provided by @Gregor:

library(dplyr)
info %>% 
  group_by(C1) %>% 
  mutate(near = colSums(abs(outer(C2, C2, "-")) <= 10) - 1)

Upvotes: 0

Gregor Thomas

Reputation: 145755

With dplyr:

library(dplyr)
info %>% group_by(C1) %>% mutate(near = abs(diff(C2)) < 10)

This assumes that each value of C1 has 2 rows.

Upvotes: 2

R: Counting rows with similar values

Answers (3)

Related Questions