Reputation: 49
I have a string like this:
"Samsung LA32D450 LCD Television 32inch Black"
I need to extract the size of the TV so I need to extract all characters between 'inch' and the preceeding whitespace. In this case I would need the expression to return 32
The regular expression needs to be able to deal with decimal points.
For example I would need 32.5
from this string:
"Samsung LA32D450 LCD Television 32.5inch Black"
Upvotes: 0
Views: 1074
Reputation: 204
(?<=\s)(\d+\.?\d*)(?=inch)
It matches the part that comes before inch
. \d+
part matches decimal part and then it matches for an optional .
. After that it looks for fractional part with \d*
.
After realizing that it accepts numbers like .6
, I make a quick edit. It looks for space character before digit part.
https://regex101.com/r/tionn9/2
Upvotes: 0
Reputation: 3519
You need to match numbers (possibly including a dot) followed by the word inch
.
You can use lookaheads to get what you want:
[\d.]+(?=inch)
This will match a combination of numbers an dots repeated 1 or more times followed by the word inch
.
You can of course, get more precise by specifying the format of the numbers.
EDIT:
Getting more precise about the number format can introduce extra complexities. I came up with this regex to match only either 2-3 digits followed by "inch" (23inch
) or just 2-3 digits followed by a dot followed by one digit followed by "inch"(23.5inch
). It uses both lookaheads and negative lookbehinds so your regex engine should support these constructs:
\b(?<![.\d])([1-9][0-9]{1,2}\.[1-9]|[1-9][0-9]{1,2})(?=inch)
Upvotes: 2
Reputation: 137
Try the following:
library(stringr)
a <- "Samsung LA32D450 LCD Television 32.1inch Black"
str_extract(a, "[:graph:]*(?=inch)")
[:graph:] matches either letters, numbers or punctuation, but not white-space.
?=inch matches everything followed by "inch".
Good luck,
Ludo
Upvotes: 0