Reputation: 618
I've written the following function for importing excel files into matlab. The function works fine, where by inserting the path name of the files, the scripts imports them into the workspace. The function is shown below:
function Data = xls_function(pathName);
%Script imports the relevant .xls files into matlab - ensure that the .xls
%files are stored in a folder specified by 'pathName'.
%--------------------------------------------------------------------------
TopFolder = pathName;
dirListing = dir(TopFolder);%Lists the folders in the directory specified
%by pathName.
dirListing = dirListing(3:end);%Remove the first two structures as they
%are only pointers.
for i = 1:length(dirListing);
SubFolder{i} = dirListing(i,1).name;%obtain the name of each folder in
%the specified path.
SubFolderPath{i} = fullfile(pathName, dirListing(i,1).name);%obtain
%the path name for each of the folders.
ExcelFile{i} = dir(fullfile(SubFolderPath{i},'*.xls'));%find the
%number of .xls files in each of the SubFolders.
for j = 1:length(ExcelFile{1,i});
ExcelFileName{1,i}{j,1} = ExcelFile{1,i}(j,1).name;%find the name
%of each .xls file in each of the SubFolders.
for k = 1:length(ExcelFileName);
for m = 1:length(ExcelFileName{1,k});
[status{1,k}{m,1},sheets{1,k}{m,1},format{1,k}{m,1}]...
= xlsfinfo((fullfile(pathName,SubFolder{1,k},...
ExcelFileName{1,k}{m,1})));%gather information on the
%.xls files i.e. worksheet names.
Name_worksheet{1,k}{m,1} = sheets{1,k}{m,1}{1,end};%obtain
%the name of each of the .xls worksheets within
%each spreadsheet.
end
end
end
end
for n = 1:length(ExcelFileName);
for o = 1:length(ExcelFileName{1,n});
%require two loops as the number of excel spreadsheets varies
%from the number of worksheets in each spreadsheet.
TXT{1,n}{o,1} = xlsread(fullfile(pathName,SubFolder{1,n},...
ExcelFileName{1,n}{o,1}),Name_worksheet{1,n}{o,1});%import the
%relevant data from excel by using the spreadsheet and
%worksheet names previously obtained.
Data.(SubFolder{n}){o,1} = TXT{1,n}{o,1};
end
end
The only problem with the script is that it takes too long to run if the number of .xls files is large. I've read that vectorization would improve the running time, therefore I am asking for any advice on how I could alter this code to run faster, through vectorization.
I realise that reading a code like this isn't easy (especially as my form of coding is by no means as efficient as I would like) but any advice provided would be much appreciated.
Upvotes: 1
Views: 1135
Reputation: 4732
I don't think vectorization applies to your problem - but one after the other.
As an example for your data you could use cellfun to turn a loop vectorized:
tmp = ExcelFileName{1,n}
result_cell = cellfun(@(x) xlsread(fullfile(pathName,x)),tmp, 'UniformOutput', false))
But the key problem is the poor implementation of xlsread
and the other excel related functions in matlab. What they do is with every(!) function call they create a new excel process (which is hidden) in which they perform your command and then end it.
I remember a tool at matlab central that reused the same excel instance and thus was very quick - but unfortunately I can no longer find it. But maybe you can find an example there on which you can base your own reader which reuses it.
On a related note - Excel has the stupid limitation that it doesn't allow you two files with the same name to be opened at the same time - and then fails with some error. So if you run your reading vectorized/parallel you are in for a whole new fun of strange errors :D
For myself I found the only propper way to deal with these documents through java with Apache POI libraries. These have the nice advantage you don't need Excel installed - but unfortunatly require some programming.
Upvotes: 1