Reputation: 19
I need to remove numbers repeating continuously in he sequence.
a b
Kor 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 66.73 >
73.42 > 66.73 > 73.42 > 66.73 > 66.73 > 66.73 >
66.73 > 66.73 > 66.73
I need the out put like below
a b
Kor 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 66.73 >
73.42 > 66.73 > 73.42 > 66.73 > 66.73 > 66.73 >
66.73 > 66.73 > 66.73
I need the below output
c= 66.73 > 73.42 > 66.73 > 73.42 > 66.73
But I am getting error in the output
66.73 > 73.42 > 66.73.42 > 66.73
I used the below code
c$c <- gsub("\\b([\\w\\.]+)( > \\1\\b)+","\\1",c$b,perl = T)
Upvotes: 0
Views: 52
Reputation: 626926
Your [\\w\\.]+
pattern matches 1 or more digits, letters or dots, and it can match 54
in 12.54 >
54.12`. You need to make sure you match a float value while making a dot an obligatory part of the pattern.
You may use
sapply(str_extract_all(gsub("\\b(\\d+\\.\\d+)(?:\\s+>\\s+\\1\\b)+", "\\1", x), "\\d+\\.\\d+"), paste, collapse=" > ")
## => [1] "66.73 > 73.42 > 66.73 > 73.42 > 66.73"
With gsub("\\b(\\d+\\.\\d+)(?:\\s+>\\s+\\1\\b)+", "\\1", x)
, you remove all the duplicate consecutive float numbers, and with str_extract_all(x1, "\\d+\\.\\d+")
you can extract those numbers that are left, and then paste
all the found values with " > "
substring.
Upvotes: 1