EBH
EBH

Reputation: 10440

How to replace many matches in the text with different values using regexprep in Matlab

I'm using the function regexprep in Matlab to replace several instances of a pattern with a list of values from a cell array. The idea is to replace the first match with the first value, the second with the next one, and so on. So each match replaced with a different value from the cell array.

From the documentation I read that:

If replace is a cell array of N character vectors and expression is a single character vector, then regexprep attempts N matches and replacements.

So here is an example of the task I have (for this example let's assume that I know there are only 4 matches):

% some text:
str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = {'111','222','333','444'}; % the cell array 
new_str = regexprep(str,pattern,values) % the actual raplace

The result:

new_str =
    '111 s;dlf kudnbv. soergi; 111va/.lge roins.br oian111a/ sergosr toibns111'

This result is not correct, of course, because all the matches were replaced by the first value in the cell array.

So I googled this problem and found this explanation. Apparently the function regexprep execute the replaces one by one, so after the first replacement, the first match that is found is what was the second match originally, and because it is recognized as the first one, it is replaced by the first value in the cell array (111).

I can work around this with a loop that preforms this task with a different value each time:

new_str = str;
for k = 1:numel(values)
    new_str = regexprep(new_str,pattern,values(k),'once'); % raplace one value each time
end

The result:

new_str =
    '111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'

which is exactly what I want.

My question is how to write the pattern or use the regexprep in order to achieve the same result without a loop?
It seems to me that I miss something in how to use this function. I'll also add that my true problem has over 100 matches within the text, so using a pattern like ([a][b][c])(.*)([a][b][c])(.*)([a][b][c])(.*)([a][b][c]) and a replace pattern like 111$2222$4333$6444 (that gives the correct result here) is not really an option.

Any help will be appreciated!

Upvotes: 4

Views: 449

Answers (3)

EBH
EBH

Reputation: 10440

Using @excaza idea, I wrote a simpler implementation, for people (like me) that don't really use OOP in Matlab:

We start with a helper function that remembers the index from the previous call to the function and returns the next cell from its input strCellArray:

function out = nextStr(strCellArray)
persistent n
    if isempty(n) || n>numel(strCellArray)
        n = 1;
    end
    out = strCellArray{n};
    n = n+1;
end

Then we can just write:

values = {'111','222','333','444'}; % the cell array 
new_str = regexprep(str,pattern,'${nextStr(values)}'); % execute the command between {...} on every call to the function
clear nextStr % to reset the counter in the function

and get the same result:

111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444

The tricky thing here is to notice that although we only call regexprep once, it is actually called N times successively, so the command in the last argument is evaluated N times.

Upvotes: 2

Cris Luengo
Cris Luengo

Reputation: 60444

According to the documentation linked in the question, regexprep(str,pattern,values), with values a cell array of strings and pattern a single string, applies the search and replace once for each element in values. It is thus equivalent to:

str = regexprep(str,pattern,values{1});
str = regexprep(str,pattern,values{2});
str = regexprep(str,pattern,values{3});
... etc.

After the first replacement, pattern is no longer present in str, so the second (and subsequent) replacements don't find any matches. That is, each call to regexprep will replace all matches. In contrast, regexprep(...,'once') will replace only the first match.

Thus:

str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc abc/abc';
pattern = '([a][b][c])'; % the patern to match
values = {'111','222','333','444'};
new_str = regexprep(str,pattern,values,'once')

will do exactly as desired:

new_str =
    '111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444 abc/abc'

Note I added two more "abc" elements to the end of the string, and note that these did not get replaced. values has 4 elements, then only the first 4 matches are replaced.

Upvotes: 3

sco1
sco1

Reputation: 12214

You could make a basic helper string generator and use the command execution replacement token.

For example:

classdef strgenerator < handle
    properties
        strs
        ii = 1
    end

    methods
        function self = strgenerator(strs)
            self.strs = strs;
        end

        function outstr = nextstr(self)
            outstr = self.strs{self.ii};

            self.ii = self.ii + 1;
            if self.ii > numel(self.strs)
                self.ii = 1;
            end
        end
    end
end

And

str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = strgenerator({'111','222','333','444'}); % the cell array 
new_str = regexprep(str,pattern,'${values.nextstr()}') % the actual raplace

Provides us with:

>> SOcode

new_str =

    '111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'

Upvotes: 4

Related Questions