Crops
Crops

Reputation: 5154

R regex - extract strings between two characters for multiple instances

I am trying to extract some keywords from a string in R as follows.

I want to get the strings in between the first ":" after each "[" and ", " or "\b".

string <- c("[G1]3451:GHEIN, [G2]FR343:4453, [G05]RT3342:34:GR", "[L1]TTG4:4532, [L3]EK445:GHR[1C]", "[RT1]JGR:45,RE")

gsub('\\[\\S+:', '', string)
"GHEIN, 4453, GR" "4532, GHR[1C]"   "45,RE"

The problem is when two ":" are there. I should be getting the output as 34:GR instead of GR.

out <- c("GHEIN, 4453, 34:GR", "4532, GHR[1C]", "45,RE")

How to get the desired result using regex in R?

Upvotes: 1

Views: 828

Answers (1)

Alexey Ferapontov
Alexey Ferapontov

Reputation: 5169

Make it non-greedy:

gsub('*?\\[\\S+:', '', string)
[1] "GHEIN, 4453, 34:GR" "4532, GHR[1C]"      "45,RE"      

Upvotes: 4

Related Questions