Reputation: 27
I have large text file that looks like this:
PMID- 123456123
OWN - NLM
DA - 20160930
PMID- 27689094
OWN - NLM
VI - 2016
DP - 2016
PMID- 27688828
OWN - NLM
STAT- Publisher
DA - 20160930
LR - 20160930
and so on... I would like to split the text file into smaller text files according to every blank line. Also name each text file corresponding to its PMID number, so it looks like this:
filename '123456123.txt' contains:
PMID- 123456123
OWN - NLM
DA - 20160930
filename '27689094.txt' contains:
PMID- 27689094
OWN - NLM
VI - 2016
DP - 2016
filename '27688828.txt' contains:
PMID- 27688828
OWN - NLM
STAT- Publisher
DA - 20160930
LR - 20160930
This is my attempt, I know how to identify blank lines (I think) but I don't know how to split and save as a smaller text file:
fid = fopen(filename);
text = fgets(fid);
blankline = sprintf('\r\n');
while ischar(text)
if strcmp(blankline,str)
%split the text
else
%write the text to the smaller file
end
end
Upvotes: 0
Views: 1041
Reputation: 65460
You can read in the entire file and then use regexp
to split the contents at empty lines. You can then use regexp
again to extract the PMID of each group and then loop through all pieces and save them. Processing the file as one giant string like this is likely going to be more performant than using fgets
to read it piece by piece.
% Tell it what folder you want to put the files in
outdir = '/my/folder';
% Read the initial file in all at once
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon empty lines
pieces = regexp(data, '\n\s*\n', 'split');
% For each piece get the PMID
pmids = regexp(pieces, '(?<=PMID-\s*)\d*', 'match', 'once');
% Now loop through and save each one
for k = 1:numel(pieces)
% Use the PMID of this piece to construct a filename
filename = fullfile(outdir, [pmids{k}, '.txt']);
% Now write the piece to the file
fid = fopen(filename, 'w');
fwrite(fid, pieces{k});
fclose(fid);
end
Upvotes: 2