Reputation: 9
e
Material newvar1
5000 4.28 > 5 > 5
5001 3 > 2 > 3 > 3
When I apply the below code
e$newvar2=sapply(str_extract_all(gsub("\\b(\\c+\\.\\c+)(?:\\s+>\\s+\\1\\b)+", "\\1", e$newvar1), "\\d+\\.\\d+"), paste, collapse=" > ")
I get a wrong output like below
e
Material newvar1 newvar2
5000 4.28 > 5 4.28
5001 3 > 2 > 3 > 3 3 > 2 > 3
Instead I need like below
e
Material newvar1 newvar2
5000 4.28 > 5 4.28 > 5
5001 3 > 2 > 3 > 3 3 > 2 > 3
Upvotes: 1
Views: 35
Reputation: 522299
We can try using str_replace_all
from the stringr
library. Find on the following pattern, and then remove it by replacing with empty string:
(\\d+(?:\\.\\d+)?) > (?=\\1)
This matches and captures a number, also matching the proceeding >
symbol. If looking ahead, we see the same number, then we remove the entire preceding number and >
separator.
That is, 3 > 3
just becomes 3
.
x <- "3 > 2 > 3.28 > 3.28 > 1.5 > 1.5"
str_replace_all(x, "(\\d+(?:\\.\\d+)?) > (?=\\1)", "")
[1] "3 > 2 > 3.28 > 1.5"
Upvotes: 1