user2062207
user2062207

Reputation: 955

Calculating differences between values in a vector

I'll I have a vector that contains just over quarter of a million values (I know, a huge amount) and I need to calculate the difference of each value from each other. So for example, with the first value 202.7952, I want to calculate the difference between every other value in my vector with 202.7952 and restrict it so that differences above 400 are discarded. Then, I want it to take the second value (202.7956) and do the same thing (including with the value above). The end result of this I hope will be a list of values that are the calculated differences of the values in my vector. For example:-

0.0004
0.0125
0.0136
etc

would be produced from taking the difference for the first value from the next three values in the list, and it continuing on to the bottom before doing the same thing but for the second value in the list. However, as I have a quarter of a million values in my vector, I know there may be some computational problem. I've produced an image to show the distribution of my data:-

enter image description here

The values I have range from 200 to 1500, with the vast majority of the values falling within the 200-500 range. I've tried to do this in java but I run into memory issues, so do any of you think/know if it's possible to do this in R and how I could go about doing so?

This is my java code:-

public class matrixDiff {

    public static void main(String[] args) throws IOException{

        double[] values = new double[271730];                       

        BufferedReader br = new BufferedReader(new FileReader("file"));

        String value = br.readLine();

        for(int i = 0; i < values.length; i++){

            if(value != null){

                values[i] = Double.parseDouble(value);
            }

            value = br.readLine();

        }

        for(int i = 0; i < values.length; i++){

            double mzValue = values[i];

            System.out.println(mzValue);

            for(int j = 0; j < values.length; j++){

                double diff = values[j];


                if((diff - mzValue) < 400 || (diff - mzValue) > -400){

                    System.out.println(diff - mzValue);

                }

            }

        }


    }
}

Thanks

Upvotes: 1

Views: 4684

Answers (2)

Dominic Comtois
Dominic Comtois

Reputation: 10401

Here's an example of how you could proceed. Sample data of size 1000.

memory.limit(max = NA)
# filter out differences larger than K
K = 25

v <- rnorm(n = 1000, mean = 200, sd = 10)
diffs <- list()
for(i in seq_along(v)) {
  diffs[[i]] <- v[i] - v
  diffs[[i]] <- diffs[[i]][diffs[[i]] <= K]
}


# Check lengths after filtering
sapply(diffs, length)

EDIT

I don't know if you considered it or if you solved your problem already, but to deal with that amount of data, one thing you could do it to store everything in a database. For instance:

library(RSQLite)
con <- dbConnect(SQLite(), "diffs.sqlite")
memory.size(max = NA)
v <- rnorm(n = 100000, mean = 200, sd = 10)

# filter out differences larger than K
K = 25

for(i in seq_along(v)) {
  diffs <- v[i] - v
  diffs <- diffs[diffs <= K]
  dbWriteTable(con, "mytable", as.data.frame(diffs), append=TRUE)
}

Then there's stuff you could do using SQL rather than R functions and that would not create memory problems.

Upvotes: 2

statespace
statespace

Reputation: 1664

Vectors are your friends in R. Huge time and memory saver.

Data frame example:

df <- data.frame(x = rnorm(1000000))
df$dif <- df$x - c(NA, df$x[1:(length(df$x)-1)])

There you go, difference of 1kk numbers in a blink of an eye.

Vector example:

x <- rnorm(1000000)
x <- c(NA, x[1:(length(x)-1)])

Or even shorter:

x <- rnorm(1000000)
x <- c(NA, diff(x))

To accumulate values through the vector you'll need cumsum():

x <- rnorm(1000000)
x <- cumsum(c(0, diff(x)))

Note the 0 insted of NA.

Upvotes: 3

Related Questions