R perfect
R perfect

Reputation: 19

Error in gsub when applied to few data points

I need to remove numbers repeating continuously in he sequence.

 a      b
Kor 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 
        73.42 > 66.73 > 73.42 > 66.73 > 66.73 > 66.73 > 
        66.73 > 66.73 > 66.73

I need the out put like below

a       b
Kor 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 66.73 > 
        73.42 > 66.73 > 73.42 > 66.73 > 66.73 > 66.73 > 
        66.73 > 66.73 > 66.73

I need the below output

c= 66.73 > 73.42 > 66.73 > 73.42 > 66.73

But I am getting error in the output

66.73 > 73.42 > 66.73.42 > 66.73

I used the below code

c$c <- gsub("\\b([\\w\\.]+)( > \\1\\b)+","\\1",c$b,perl = T)

Upvotes: 0

Views: 52

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

Your [\\w\\.]+ pattern matches 1 or more digits, letters or dots, and it can match 54 in 12.54 >54.12`. You need to make sure you match a float value while making a dot an obligatory part of the pattern.

You may use

sapply(str_extract_all(gsub("\\b(\\d+\\.\\d+)(?:\\s+>\\s+\\1\\b)+", "\\1", x), "\\d+\\.\\d+"), paste, collapse=" > ")
## => [1] "66.73 > 73.42 > 66.73 > 73.42 > 66.73"

With gsub("\\b(\\d+\\.\\d+)(?:\\s+>\\s+\\1\\b)+", "\\1", x), you remove all the duplicate consecutive float numbers, and with str_extract_all(x1, "\\d+\\.\\d+") you can extract those numbers that are left, and then paste all the found values with " > " substring.

Upvotes: 1

Related Questions