Reputation: 21
I have a large matrix of 102730 rows in the form of text file (sample text file is attached) with some header files in it. The first column show year, the next the month, followed by the day, and value1, value2 and value 3. Some of the cells are missing/empty. I want to fill these empty cells with NaN, so that they don't interefere with the next value.
This is the input matrix:
1970 01 13 21.0 6.1 06 000.0
1970 01 14 22.4 8.1 03 000.0
1970 01 15 21.2 8.1 04 000.0
1970 01 16 22.6 9.1 04 000.0
1970 01 17 22.8 9.1 02 000.0
1970 01 18 22.9 8.9 07 000.0
1970 01 19 23.8 10.8 04 000.0
1970 01 20 21.8 12.1 10 010.5
1970 01 21 19.8 06 012.9
1970 01 22 15.3 8.5 07 000.0
1974 06 28 39.2 25.6 03 000.0
1974 06 29 41.2 30.5 05 000.0
1974 06 30 40.3 31.2 07 000.0
1974 07 01 41.3 31.5 12 000.0
1974 07 02 43.3 31.3 20 000.0
1974 07 03 41.2 16 041.6
1974 07 04 34.3 21.4 14 054.5
1974 07 05 33.1 23.8 05 000.0
1974 07 06 36.2 28.9 06 000.0
1975 04 18 36.6 20.8 12 000.0
1975 04 19 37.4 21.1 05 000.0
1975 04 20 39.9 27.0 07 000.0
1975 04 21 39.5 27.3 09 000.0
1975 04 22
1975 04 23 39.5 27.1 08 000.0
1975 04 24 37.7 26.0 10 000.0
1975 04 25 38.7 27.2 15 000.0
The desired output matrix:
1970 01 13 21.0 6.1 06 000.0
1970 01 14 22.4 8.1 03 000.0
1970 01 15 21.2 8.1 04 000.0
1970 01 16 22.6 9.1 04 000.0
1970 01 17 22.8 9.1 02 000.0
1970 01 18 22.9 8.9 07 000.0
1970 01 19 23.8 10.8 04 000.0
1970 01 20 21.8 12.1 10 010.5
1970 01 21 19.8 Nan 06 012.9
1970 01 22 15.3 8.5 07 000.0
1974 06 28 39.2 25.6 03 000.0
1974 06 29 41.2 30.5 05 000.0
1974 06 30 40.3 31.2 07 000.0
1974 07 01 41.3 31.5 12 000.0
1974 07 02 43.3 31.3 20 000.0
1974 07 03 41.2 Nan 16 041.6
1974 07 04 34.3 21.4 14 054.5
1974 07 05 33.1 23.8 05 000.0
1974 07 06 36.2 28.9 06 000.0
1975 04 18 36.6 20.8 12 000.0
1975 04 19 37.4 21.1 05 000.0
1975 04 20 39.9 27.0 07 000.0
1975 04 21 39.5 27.3 09 000.0
1975 04 22 Nan Nan Nan Nan
1975 04 23 39.5 27.1 08 000.0
1975 04 24 37.7 26.0 10 000.0
1975 04 25 38.7 27.2 15 000.0
As an attempt, first I tried with this:
T = readtable('sample.txt') ;
Above code didn't work since it meshed up and gave the wrong number of columns when there 2 digits before the decimal. Secondly, I found this link: Creating new matrix from cell with some empty cells disregarding empty cells
The foll. code snippet may be useful from this link, but I don't know how to read the data directly from the text pad inorder to apply this code & subsequent retrieval process:
inds = ~cellfun('isempty', elem); %elem to be replaced as sample
I also find out the method to detect empty cells here: How do I detect empty cells in a cell array?
but I couldn't figure out how to read the data from a text file considering these empty cells.
Could anyone please help?
Upvotes: 2
Views: 343
Reputation: 26153
Since R2019a, you can simply use readmatrix:
>> myMat = readmatrix('sample.txt')
From the docs:
For delimited text files, the importing function converts empty fields in the file to either NaN (for a numeric variable) or an empty character vector (for a text variable). All lines in the text file must have the same number of delimiters. The importing function ignores insignificant white space in the file.
For previous releases, you can use detectImportOptions object when calling readtable:
% Detect options.
>> opts = detectImportOptions('sample.txt');
% Read table.
>> myTable = readtable('sample.txt',opts);
% Visualise last rows of table.
>> tail(myTable)
ans =
8×7 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7
____ ____ ____ ____ ____ ____ ____
1975 4 18 36.6 20.8 12 0
1975 4 19 37.4 21.1 5 0
1975 4 20 39.9 27 7 0
1975 4 21 39.5 27.3 9 0
1975 4 22 NaN NaN NaN NaN
1975 4 23 39.5 27.1 8 0
1975 4 24 37.7 26 10 0
1975 4 25 38.7 27.2 15 0
For your text file, detectImportOptions
is filling missing values with NaN
:
>> opts.VariableOptions
If the desired output is a matrix, you can then use table2array:
>> myMat = table2array(myTable)
Upvotes: 1