R roaring
R roaring

Reputation: 9

Issue in regular expressions

e
Material     newvar1 
5000    4.28 > 5 > 5
5001    3 > 2 > 3 > 3

When I apply the below code

e$newvar2=sapply(str_extract_all(gsub("\\b(\\c+\\.\\c+)(?:\\s+>\\s+\\1\\b)+", "\\1", e$newvar1), "\\d+\\.\\d+"), paste, collapse=" > ")

I get a wrong output like below

e
Material     newvar1         newvar2
5000        4.28 > 5         4.28
5001        3 > 2 > 3 > 3    3 > 2 > 3

Instead I need like below

e
Material     newvar1         newvar2
5000        4.28 > 5        4.28 > 5
5001        3 > 2 > 3 > 3    3 > 2 > 3

Upvotes: 1

Views: 35

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522299

We can try using str_replace_all from the stringr library. Find on the following pattern, and then remove it by replacing with empty string:

(\\d+(?:\\.\\d+)?) > (?=\\1)

This matches and captures a number, also matching the proceeding > symbol. If looking ahead, we see the same number, then we remove the entire preceding number and > separator.

That is, 3 > 3 just becomes 3.

x <- "3 > 2 > 3.28 > 3.28 > 1.5 > 1.5"
str_replace_all(x, "(\\d+(?:\\.\\d+)?) > (?=\\1)", "")

[1] "3 > 2 > 3.28 > 1.5"

Upvotes: 1

Related Questions