user3272910
user3272910

Reputation: 57

Improving performance when looping in a big data set

I am making some spatio-temporal analysis (with MATLAB) on a quite big data set and I am not sure what is the best strategy to adopt in terms of performance for my script.

Actually, the data set is split in 10 yearly arrays of dimension (latitude,longitude,time)=(50,60,8760).

The general structure of my analysis is:

 for iterations=1:Big Number  

  1. Select a specific site of spatial reference (i,j).   
  2. Do some calculation on the whole time series of site (i,j). 
  3. Store the result in archive array.

 end

My question is:

Is it better (in terms of general performance) to have

1) all data in big yearly (50,60,8760) arrays as global variables loaded for once. At each iteration the script will have to extract one particular "site" (i,j,:) from those arrays for data process.

2) 50*60 distinct files stored in a folder. Each file containing a particular site time series (a vector of dimension (Total time range,1)). The script will then have to open, data process and then close at each iteration a specific file from the folder.

Upvotes: 1

Views: 73

Answers (2)

user3272910
user3272910

Reputation: 57

After doing some experiments it is clear that the second proposition with 3000 distinct files is much slower than having to manipulate big arrays loaded in workspace. But I didn't try to load all the 3000 files in workspace before computing (A tad to much).

It looks like Reshaping data help's a little bit.

Thanks to all contributors for your suggestions.

Upvotes: 0

Anand
Anand

Reputation: 377

Because your computations are computed on the entire time series, I would suggest storing the data that way in a 3000x8760 vector and doing the computations that way.

Your accesses then will be more cache-friendly.

You can reformat your data using the reshape function:

newdata = reshape(olddata,50*60,8760);

Now, instead of accessing olddata(i,j,:), you need to access newdata(sub2ind([50 60],i,j),:).

Upvotes: 1

Related Questions