handles
handles

Reputation: 21

Problems with empty cell in a large matrix: MATLAB

I have a large matrix of 102730 rows in the form of text file (sample text file is attached) with some header files in it. The first column show year, the next the month, followed by the day, and value1, value2 and value 3. Some of the cells are missing/empty. I want to fill these empty cells with NaN, so that they don't interefere with the next value.

This is the input matrix:

1970 01 13  21.0   6.1 06 000.0
1970 01 14  22.4   8.1 03 000.0
1970 01 15  21.2   8.1 04 000.0
1970 01 16  22.6   9.1 04 000.0
1970 01 17  22.8   9.1 02 000.0
1970 01 18  22.9   8.9 07 000.0
1970 01 19  23.8  10.8 04 000.0
1970 01 20  21.8  12.1 10 010.5
1970 01 21  19.8       06 012.9
1970 01 22  15.3   8.5 07 000.0
1974 06 28  39.2  25.6 03 000.0
1974 06 29  41.2  30.5 05 000.0
1974 06 30  40.3  31.2 07 000.0
1974 07 01  41.3  31.5 12 000.0
1974 07 02  43.3  31.3 20 000.0
1974 07 03  41.2       16 041.6
1974 07 04  34.3  21.4 14 054.5
1974 07 05  33.1  23.8 05 000.0
1974 07 06  36.2  28.9 06 000.0
1975 04 18  36.6  20.8 12 000.0
1975 04 19  37.4  21.1 05 000.0
1975 04 20  39.9  27.0 07 000.0
1975 04 21  39.5  27.3 09 000.0
1975 04 22                     
1975 04 23  39.5  27.1 08 000.0
1975 04 24  37.7  26.0 10 000.0
1975 04 25  38.7  27.2 15 000.0

The desired output matrix:

1970 01 13  21.0   6.1 06 000.0
1970 01 14  22.4   8.1 03 000.0
1970 01 15  21.2   8.1 04 000.0
1970 01 16  22.6   9.1 04 000.0
1970 01 17  22.8   9.1 02 000.0
1970 01 18  22.9   8.9 07 000.0
1970 01 19  23.8  10.8 04 000.0
1970 01 20  21.8  12.1 10 010.5
1970 01 21  19.8  Nan  06 012.9
1970 01 22  15.3   8.5 07 000.0
1974 06 28  39.2  25.6 03 000.0
1974 06 29  41.2  30.5 05 000.0
1974 06 30  40.3  31.2 07 000.0
1974 07 01  41.3  31.5 12 000.0
1974 07 02  43.3  31.3 20 000.0
1974 07 03  41.2  Nan  16 041.6
1974 07 04  34.3  21.4 14 054.5
1974 07 05  33.1  23.8 05 000.0
1974 07 06  36.2  28.9 06 000.0
1975 04 18  36.6  20.8 12 000.0
1975 04 19  37.4  21.1 05 000.0
1975 04 20  39.9  27.0 07 000.0
1975 04 21  39.5  27.3 09 000.0
1975 04 22  Nan   Nan  Nan Nan                     
1975 04 23  39.5  27.1 08 000.0
1975 04 24  37.7  26.0 10 000.0
1975 04 25  38.7  27.2 15 000.0

As an attempt, first I tried with this:

T = readtable('sample.txt') ;

Above code didn't work since it meshed up and gave the wrong number of columns when there 2 digits before the decimal. Secondly, I found this link: Creating new matrix from cell with some empty cells disregarding empty cells

The foll. code snippet may be useful from this link, but I don't know how to read the data directly from the text pad inorder to apply this code & subsequent retrieval process:

inds = ~cellfun('isempty', elem); %elem to be replaced as sample

I also find out the method to detect empty cells here: How do I detect empty cells in a cell array?

but I couldn't figure out how to read the data from a text file considering these empty cells.

Could anyone please help?

Upvotes: 2

Views: 343

Answers (1)

Paolo
Paolo

Reputation: 26153

Since R2019a, you can simply use readmatrix:

>> myMat = readmatrix('sample.txt')

From the docs:

For delimited text files, the importing function converts empty fields in the file to either NaN (for a numeric variable) or an empty character vector (for a text variable). All lines in the text file must have the same number of delimiters. The importing function ignores insignificant white space in the file.



For previous releases, you can use detectImportOptions object when calling readtable:

% Detect options.
>> opts = detectImportOptions('sample.txt');
% Read table.
>> myTable = readtable('sample.txt',opts);
% Visualise last rows of table.
>> tail(myTable)

ans =

  8×7 table

    Var1    Var2    Var3    Var4    Var5    Var6    Var7
    ____    ____    ____    ____    ____    ____    ____

    1975     4       18     36.6    20.8     12       0 
    1975     4       19     37.4    21.1      5       0 
    1975     4       20     39.9      27      7       0 
    1975     4       21     39.5    27.3      9       0 
    1975     4       22      NaN     NaN    NaN     NaN 
    1975     4       23     39.5    27.1      8       0 
    1975     4       24     37.7      26     10       0 
    1975     4       25     38.7    27.2     15       0 

For your text file, detectImportOptions is filling missing values with NaN :

>> opts.VariableOptions

If the desired output is a matrix, you can then use table2array:

>> myMat = table2array(myTable)

Upvotes: 1

Related Questions