Mr369
Mr369

Reputation: 394

R extract the first pattern from the end of string

I want to extract sizes from strings, which can be:

a <- c("xxxxxxx 2.5 oz (23488)",
        "xxxxx /1.36oz",
        "xxxxx/7 days /20 ml")

Result I want: 2.5 oz /1.36oz /20 ml

Because strings varies, so I want to extract patterns backward. That is, I want to extract the first appearance of \\/*(\\d+\\.*\\d*)\\s*[[:alpha:]]+ from the end of the string. It will avoid R from taking 23488 from the first string and /7 days from the third string.

Anyone knows how I can achieve this? Thanks!

Upvotes: 1

Views: 229

Answers (2)

Kevin Kamonseki
Kevin Kamonseki

Reputation: 141

If you know the name of the units(oz, ml, etc), you could try something like this:

((\d*|\d*\.\d{0,2})\s?(ml|oz|etc))

See working example.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You may use

> a <- c("xxxxxxx 2.5 oz (23488)",
+         "xxxxx /1.36oz",
+         "xxxxx/7 days /20 ml")
> regmatches(a, regexpr("/?\\d+(?:\\.\\d+)?\\s*\\p{L}+(?!.*\\d(?:\\.\\d+)?\\s*\\p{L}+)", a, perl=TRUE))
[1] "2.5 oz"  "/1.36oz" "/20 ml" 

See the regex demo.

Details

  • /? - an optional /
  • \\d+ - 1+ digits
  • (?:\\.\\d+)? - an optional . and 1+ digits sequence
  • \\s* - 0+ whitespaces
  • \\p{L}+ - 1+ letters
  • (?!.*\\d(?:\\.\\d+)?\\s*\\p{L}+) - not followed with
    • .* - any 0+ chars, as many as possible
    • \\d - a digit
    • (?:\\.\\d+)? - an optional . and 1+ digits sequence
    • \\s* - 0+ whitespaces
    • \\p{L}+ - 1+ letters

Upvotes: 3

Related Questions