Calculation on a vector of values in a R data.table

Question

I have the following data.table in R:

dataset <- data.table(C=c("a", "b", "c") , neg=c("5, 7", "9", "3, 4, 5"), pos = c("5.05, 8", "", "2.95, 4.2"))

the table looks like this:

example data.table

I want to find an overlap between the values in the columns “neg” and “pos”. If the difference between any value in the two columns in the same row is smaller than 0.1 - I want to merge the values by taking the mean of the two values. E.g. for the pair 5 and 5.05 – 5.025 should be calculated. If there is no value within the same 0.1 range just the original value is shown. I added a picture of my idea of a possible result:

possible result table

Is there a function to do this directly or do I have to split/ rearrange the table before?

Thanks for your help!

Ronak Shah · Accepted Answer

The numbers are stored as character values so first you need to split them on comma, convert them to numeric, sort the data. You can then calculate the difference between consecutive values and combine the two values (by taking their average) if their difference is less than 0.1.

In base R, with Map and tapply you can do -

dataset$overlap <- Map(function(x, y) {
  p <- sort(as.numeric(c(x, y)))
  as.numeric(tapply(p, cumsum(c(TRUE, diff(p) > 0.1)), mean))
}, strsplit(dataset$neg, ',\s*'), strsplit(dataset$pos, ',\s*'))

dataset

#   C     neg       pos                 overlap
#1: a    5, 7   5.05, 8       5.025,7.000,8.000
#2: b       9                                 9
#3: c 3, 4, 5 2.95, 4.2 2.975,4.000,4.200,5.000

dataset$overlap

#[[1]]
#[1] 5.025 7.000 8.000

#[[2]]
#[1] 9

#[[3]]
#[1] 2.975 4.000 4.200 5.000

Calculation on a vector of values in a R data.table

Answers (2)

Related Questions