Reputation: 1001

Replace NaN with Null of a very long string in Matlab

I have a very long string, I'd like to find all the NaN values and replace it 'Null'. This long string was converted from a 120 x 150000 cell. The reason for converting it into a long string was I was going to convert it into a one giant SQL query as fastinsert and datainsert can can be very slow and sometimes I'm running out of heap space. The idea is to do the following

exec(sqlConnection, long_string)

I tried using the regexpreop to replace NaN with null but it seems very slow. Is there an alternative way.

long_string = regexprep(long_string,'NaN','null');

Upvotes: 1

Answers (2)

p8me

Reputation: 1860

As Floris mentioned regexp is a very strong command and as a result is slower than other find commands.

In addition to Floris suggestion you can try using strrep which works in your case, since you are not using any of the special powers of regexp.

Here is an example:

str = char('A' + rand(1,120 * 15000)*('z'-'A'));

tic
str2 = strrep(str, 'g', 'null');
disp('strrep: '), toc

tic
str3 = regexprep(str, 'g','null');
disp('regexprep: '), toc

On my computer it will return:

strrep: 
Elapsed time is 0.004640 seconds.
regexprep: 
Elapsed time is 4.004671 seconds.

Upvotes: 5

Floris

Reputation: 46375

regex is very powerful, but can be slow because of its flexibility. If you still have the original cell array - and assuming it contains only strings - the following line of code should work, and very fast:

cellArray{find(ismember(cellArray,'NaN'))} = 'null';

ismember finds all the elements in cellArray that are NaN, returning a boolean array with the same shape as cellArray; the find operation turns these into indices of the elements that are NaN, and then you just assign the value null to all of them.

It must be said that 120 x 150,000 is a VERY large cell array - it will occupy over 2 GB even with just a single character in each cell (I know this because I just created a 120x15000 cell array, and it was 205,500,000 bytes). It might be worth processing this in smaller chunks rather than all at once. Especially if you know that the NaN would occur only in some of the columns, for example.

Processing a GB sized string, especially when you can't operate in-place (you are changing the size of the string with every replacement, and it's getting longer, not shorter) is going to be dead slow. It's possible you could write a short mex function to do this if you really have no other option - that could be pretty fast.

Upvotes: 2

Replace NaN with Null of a very long string in Matlab

Answers (2)

Related Questions