Gbru
Gbru

Reputation: 97

Search a file for a string and put it into an array

The goal of the program is to look through a file for a string and spit out all the instances of that string with the line that the the string was in.

I've gotten it to search through the file and find them, just not able to get them into an array or something that lets me store all of them. Right now it gives me the last instance, I can easily put in one break between line 8 and 9 to find the first instance also.

If anyone knows how to store every line that has the string in question that would be a great help.

fid = fopen('....... file directory....')
prompt = 'What string are you searching for?  ';
str = input(prompt,'s');

i=0;
for j=1:10000;
tline = fgetl(fid);             %Returns next line of specified file
counter = counter + 1;          %Counts the next line
    if ischar(tline);           %Checks if the line is an array
    U=strfind(tline,str);       %Sets U to be 1 if strfind finds the string in line tline
        if isfinite(U) == 1                     
            what = tline;       %This is where I want to set the array positions equal to whatever tline is at that time, then move onto the next i and search for the next tline.
            i=i+1;
        end            
    end
end

Upvotes: 1

Views: 162

Answers (2)

Rody Oldenhuis
Rody Oldenhuis

Reputation: 38032

I would suggest the following:

haystack = 'test.txt';

prompt = 'What string are you searching for?  ';
needle = input(prompt, 's');

% MS Windows
if ispc

    output = regexp(evalc(['!find /N "' needle '" ' haystack]), char(10), 'split');

    output = regexp(output, '^\[([0-9].*)\]', 'tokens');
    output = cellfun(@(x)str2double(x{:}), output(~cellfun('isempty', output)))';

% OSX/Linux   
elseif isunix

    output = regexp(evalc(['!grep -n "' needle '" ' haystack]), char(10), 'split');

    output = regexp(output, '^([0-9].*):', 'tokens');    
    output = cellfun(@(x)str2double(x{:}), output(~cellfun('isempty', output)))';

% Anything else: stay in MATLAB
else

    fid = fopen(haystack);
    output = textscan(fid, '%s', 'delimiter', '\n');
    fclose(fid);

    output = find(~cellfun('isempty', regexp(output{1}, needle)));

end

Contents of test.txt:

garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage valuable garbage 
garbage garbage garbage 
garbage garbage valuable 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage valuable garbage 
garbage garbage garbage 
garbage garbage garbage 
garbage garbage garbage 

When I execute the code on Windows or Linux or force the MATLAB version, with needle = 'valuable', I get the correct line numbers:

output = 
    6   
    8  
   13

The advantage of using the OS-specific tools is that they have a far smaller memory footprint than the pure MATLAB version (they don't load the entire file in memory). Even if you would grow the code in MATLAB in order to prevent this (by using a loop with freadl for instance), the OS-specific tools will still be quite a bit faster (and still more memory friendly); that's why I put it as a final resort :)

Upvotes: 1

Marcin
Marcin

Reputation: 238309

You can store them in struct array, for example:

lines = struct([]); % to store lines and line numbers    

idx = 1;

fid = fopen('somefile.txt');

tline = fgets(fid);

while ischar(tline)

    U=strfind(tline, str);

    if numel(U) > 0                            

        lines(end + 1).line = tline; % save line
        lines(end).lineNo = idx;     % save its number 
        lines(end).U = U;            % where is str in the just saved line            

    end

    tline = fgets(fid);

    idx = idx + 1;

end

fclose(fid);

lineTxts    = {lines(:).line};   % get lines in a cell
lineNumbers = [lines(:).lineNo]; % get line numbers as matrix

Upvotes: 0

Related Questions