Reputation: 394
I want to extract sizes from strings, which can be:
a <- c("xxxxxxx 2.5 oz (23488)",
"xxxxx /1.36oz",
"xxxxx/7 days /20 ml")
Result I want: 2.5 oz /1.36oz /20 ml
Because strings varies, so I want to extract patterns backward. That is, I want to extract the first appearance of \\/*(\\d+\\.*\\d*)\\s*[[:alpha:]]+
from the end of the string. It will avoid R from taking 23488
from the first string and /7 days
from the third string.
Anyone knows how I can achieve this? Thanks!
Upvotes: 1
Views: 229
Reputation: 141
If you know the name of the units(oz, ml, etc), you could try something like this:
((\d*|\d*\.\d{0,2})\s?(ml|oz|etc))
See working example.
Upvotes: 1
Reputation: 627600
You may use
> a <- c("xxxxxxx 2.5 oz (23488)",
+ "xxxxx /1.36oz",
+ "xxxxx/7 days /20 ml")
> regmatches(a, regexpr("/?\\d+(?:\\.\\d+)?\\s*\\p{L}+(?!.*\\d(?:\\.\\d+)?\\s*\\p{L}+)", a, perl=TRUE))
[1] "2.5 oz" "/1.36oz" "/20 ml"
See the regex demo.
Details
/?
- an optional /
\\d+
- 1+ digits(?:\\.\\d+)?
- an optional .
and 1+ digits sequence\\s*
- 0+ whitespaces\\p{L}+
- 1+ letters(?!.*\\d(?:\\.\\d+)?\\s*\\p{L}+)
- not followed with
.*
- any 0+ chars, as many as possible\\d
- a digit(?:\\.\\d+)?
- an optional .
and 1+ digits sequence\\s*
- 0+ whitespaces\\p{L}+
- 1+ lettersUpvotes: 3