Reputation: 10440
I'm using the function regexprep
in Matlab to replace several instances of a pattern with a list of values from a cell array. The idea is to replace the first match with the first value, the second with the next one, and so on. So each match replaced with a different value from the cell array.
From the documentation I read that:
If
replace
is a cell array of N character vectors andexpression
is a single character vector, then regexprep attempts N matches and replacements.
So here is an example of the task I have (for this example let's assume that I know there are only 4 matches):
% some text:
str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = {'111','222','333','444'}; % the cell array
new_str = regexprep(str,pattern,values) % the actual raplace
The result:
new_str =
'111 s;dlf kudnbv. soergi; 111va/.lge roins.br oian111a/ sergosr toibns111'
This result is not correct, of course, because all the matches were replaced by the first value in the cell array.
So I googled this problem and found this explanation. Apparently the function regexprep
execute the replaces one by one, so after the first replacement, the first match that is found is what was the second match originally, and because it is recognized as the first one, it is replaced by the first value in the cell array (111).
I can work around this with a loop that preforms this task with a different value each time:
new_str = str;
for k = 1:numel(values)
new_str = regexprep(new_str,pattern,values(k),'once'); % raplace one value each time
end
The result:
new_str =
'111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'
which is exactly what I want.
My question is how to write the pattern
or use the regexprep
in order to achieve the same result without a loop?
It seems to me that I miss something in how to use this function. I'll also add that my true problem has over 100 matches within the text, so using a pattern like ([a][b][c])(.*)([a][b][c])(.*)([a][b][c])(.*)([a][b][c])
and a replace pattern like 111$2222$4333$6444
(that gives the correct result here) is not really an option.
Any help will be appreciated!
Upvotes: 4
Views: 449
Reputation: 10440
Using @excaza idea, I wrote a simpler implementation, for people (like me) that don't really use OOP in Matlab:
We start with a helper function that remembers the index from the previous call to the function and returns the next cell from its input strCellArray
:
function out = nextStr(strCellArray)
persistent n
if isempty(n) || n>numel(strCellArray)
n = 1;
end
out = strCellArray{n};
n = n+1;
end
Then we can just write:
values = {'111','222','333','444'}; % the cell array
new_str = regexprep(str,pattern,'${nextStr(values)}'); % execute the command between {...} on every call to the function
clear nextStr % to reset the counter in the function
and get the same result:
111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444
The tricky thing here is to notice that although we only call regexprep
once, it is actually called N times successively, so the command in the last argument is evaluated N times.
Upvotes: 2
Reputation: 60444
According to the documentation linked in the question, regexprep(str,pattern,values)
, with values
a cell array of strings and pattern
a single string, applies the search and replace once for each element in values
. It is thus equivalent to:
str = regexprep(str,pattern,values{1});
str = regexprep(str,pattern,values{2});
str = regexprep(str,pattern,values{3});
... etc.
After the first replacement, pattern
is no longer present in str
, so the second (and subsequent) replacements don't find any matches. That is, each call to regexprep
will replace all matches. In contrast, regexprep(...,'once')
will replace only the first match.
Thus:
str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc abc/abc';
pattern = '([a][b][c])'; % the patern to match
values = {'111','222','333','444'};
new_str = regexprep(str,pattern,values,'once')
will do exactly as desired:
new_str =
'111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444 abc/abc'
Note I added two more "abc" elements to the end of the string, and note that these did not get replaced. values
has 4 elements, then only the first 4 matches are replaced.
Upvotes: 3
Reputation: 12214
You could make a basic helper string generator and use the command execution replacement token.
For example:
classdef strgenerator < handle
properties
strs
ii = 1
end
methods
function self = strgenerator(strs)
self.strs = strs;
end
function outstr = nextstr(self)
outstr = self.strs{self.ii};
self.ii = self.ii + 1;
if self.ii > numel(self.strs)
self.ii = 1;
end
end
end
end
And
str = 'abc s;dlf kudnbv. soergi; abcva/.lge roins.br oianabca/ sergosr toibnsabc';
pattern = '([a][b][c])'; % the patern to match
values = strgenerator({'111','222','333','444'}); % the cell array
new_str = regexprep(str,pattern,'${values.nextstr()}') % the actual raplace
Provides us with:
>> SOcode
new_str =
'111 s;dlf kudnbv. soergi; 222va/.lge roins.br oian333a/ sergosr toibns444'
Upvotes: 4