Reputation: 40
I'm trying to clean a string with "Replace in string" step in PDI KETTLE.
The input string looks like this:
<p class="MsoNormal" style="FONT-SIZE: 11pt; mso-ansi-language: ES"> AAA <p></p></span></p> <p class="MsoNormal" style="FONT-SIZE: 11pt; mso-ansi-language: ES"> BBB <personname w:st="on"> CCC.
The desired output would be to delete string portions between every '<' and '>' chars, to get this:
AAA BBB CCC.
Looking for similar questions, I tried with this one Replace string using regular expression in KETTLE
In a "Replace in string" step, I use RegEx, search for (<(.*)>) and nothing to replace with.
But the problem is that it deletes everything bewteen the first '<' and the last '>' chars, and the output is:
CCC.
How should I build the RegEx expression?
Upvotes: 0
Views: 77
Reputation: 4544
The problem is that your (.*)
is greedy, therefore it'll capture everything up to the last >
.
To make it lazy you can either:
(<(.*?)>)
(<([^>]*)>)
Either should work and produce as output
AAA BBB CCC.
Upvotes: 1