Mary
Mary

Reputation: 788

Mixed Data Types Matlab Tables

I wish to use the makeValidName function on the following data:

Id  Val Random          Desc
a   1.1 0.036835624 Bread Cheese
b   2.2 0.020442492 Fish Bread
c   -3.3    0.020050676 Cheese Fish
d   #N/A    0.017619332 Bread Cheese
e   -5.4    0.014973153 Fish Bread
f   6.6 0.014648887 Cheese Fish
g   -7.6    0.014071844 Bread Cheese
h   8   0.014013118 Fish Bread

However when I import the table (read from xlsx using readtable) it looks like this:

inputData =

 Id             Val              Random          Desc     
____    ____________________    ________    ______________

'a '    '1.1'                   0.036836    'Bread Cheese'
'b'     '2.2'                   0.020442    'Fish Bread'  
'c'     '-3.3'                  0.020051    'Cheese Fish' 
'd'     'ActiveX VT_ERROR: '    0.017619    'Bread Cheese'
'e'     '-5.4'                  0.014973    'Fish Bread'  
'f'     '6.6'                   0.014649    'Cheese Fish' 
'g'     '-7.6'                  0.014072    'Bread Cheese'
'h'     '8'                     0.014013    'Fish Bread'  

How do I prevent it from turning the entries in Val from numbers to strings? This makes it impossible to use makeValidName. I need to apply makeValidName across all rows and columns as the tables are very large and it is not feasible to name the appropriate columns individually. What would then be the most elegant way of achieving this?

Current code:

varnames = inputData.Properties.VariableNames;
for ii = 1:length(varnames)

inputData.(varnames{ii})= matlab.lang.makeValidName(inputData.(varnames{ii}));

end

Produces the error:

Error using matlab.lang.makeValidName (line 72) First input must be string or vector cell array of strings.

and produces undesirable results in columns such as Val:

inputData =

Id            Val             Random         Desc     
___    __________________    ________    _____________

'a'    'x1_1'                0.036836    'BreadCheese'
'b'    'x2_2'                0.020442    'FishBread'  
'c'    'x_3_3'               0.020051    'CheeseFish' 
'd'    'ActiveXVT_ERROR_'    0.017619    'BreadCheese'
'e'    'x_5_4'               0.014973    'FishBread'  
'f'    'x6_6'                0.014649    'CheeseFish' 
'g'    'x_7_6'               0.014072    'BreadCheese'
'h'    'x8'                  0.014013    'FishBread'

Upvotes: 0

Views: 261

Answers (1)

sco1
sco1

Reputation: 12214

Since it seems like utilizing Excel in the middle is creating more of a headache. I would recommend utilizing basic mode which will alleviate some of the parsing errors.

From the documentation:

basic mode is the default for systems without Excel for Windows. In basic mode, readtable:

  • Reads XLS, XLSX, XLSM, XLTX, and XLTM files only.
  • Does not support the 'Range' name-value pair argument when reading XLS files.
  • Imports all dates as Excel serial date numbers. Excel serial date numbers use a different reference date than MATLAB® date numbers.

This allows us to utilize the TreatAsEmpty name-value pair argument since it will correctly parse the numeric columns.

inputData = readtable('test.xlsx', 'Basic', 1, 'TreatAsEmpty', '#N/A');

Which returns for the sample case:

inputData = 

    Id     Val      Random          Desc     
    ___    ____    ________    ______________

    'a'     1.1    0.036836    'Bread Cheese'
    'b'     2.2    0.020442    'Fish Bread'  
    'c'    -3.3    0.020051    'Cheese Fish' 
    'd'     NaN    0.017619    'Bread Cheese'
    'e'    -5.4    0.014973    'Fish Bread'  
    'f'     6.6    0.014649    'Cheese Fish' 
    'g'    -7.6    0.014072    'Bread Cheese'
    'h'       8    0.014013    'Fish Bread'

Theoretically this should mean that the numeric data columns are double arrays and the strings remain in cell arrays. Therefore, to use matlab.lang.makeValidName you can test each column with iscell to see if it's a cell array:

varnames = inputData.Properties.VariableNames;
for ii = 1:length(varnames)
    if iscell(inputData.(varnames{ii}))
        % If they're strings they're in a cell array
        inputData.(varnames{ii})= matlab.lang.makeValidName(inputData.(varnames{ii}));
    end
end

Which returns:

inputData = 

    Id     Val      Random         Desc     
    ___    ____    ________    _____________

    'a'     1.1    0.036836    'BreadCheese'
    'b'     2.2    0.020442    'FishBread'  
    'c'    -3.3    0.020051    'CheeseFish' 
    'd'     NaN    0.017619    'BreadCheese'
    'e'    -5.4    0.014973    'FishBread'  
    'f'     6.6    0.014649    'CheeseFish' 
    'g'    -7.6    0.014072    'BreadCheese'
    'h'       8    0.014013    'FishBread'  

Upvotes: 1

Related Questions