Reputation: 767
I have a list of strings such as
2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic
2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight
2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver
I want to truncate the string at either the engine capacity (5.4l, 2.5l) or body type (4d wagon, 4d sedan), whichever comes first. So output should be:
2007 ford falcon xr8 ripcurl bf mkii utility
2004 nissan x-trail ti 4x4 t30
2002 subaru liberty rx my03
I figure I will create a list of words with .split(' '). However, my problem is how to stop at a x.xl or xd word where x could be any number. What sort of regex would pick this up?
Upvotes: 2
Views: 136
Reputation: 67968
^.*?(?=\s*\d+d\s+(?:wagon|sedan)|\s*\d+(?:\.\d+)?l)
You can use this.See demo.
https://regex101.com/r/aC0uK6/1
import re
p = re.compile(ur'^.*?(?=\s*\d+d\s+(?:wagon|sedan)|\s*\d+(?:\.\d+)?l)', re.MULTILINE)
test_str = u"2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic \n2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight \n2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver "
re.findall(p, test_str)
Upvotes: 1
Reputation: 473863
One option would be to replace everything starting from the word that has a number followed by l
or a number followed by d
followed by wagon
or sedan
, with an empty string using re.sub()
:
>>> import re
>>>
>>> l = ["2007 ford falcon xr8 ripcurl bf mkii utility 5.4l v8 cyl 6 sp manual bionic ", "2004 nissan x-trail ti 4x4 t30 4d wagon 2.5l 4 cyl 5 sp manual twilight ", "2002 subaru liberty rx my03 4d sedan 2.5l 4 cyl 5 sp manual silver"]
>>> for item in l:
... print(re.sub(r"(\b[0-9.]+l\b|\d+d (?:wagon|sedan)).*$", "", item))
...
2007 ford falcon xr8 ripcurl bf mkii utility
2004 nissan x-trail ti 4x4 t30
2002 subaru liberty rx my03
where:
\b[0-9.]+l\b
would match a word that has one more digits or dots ending with l
\d+d (?:wagon|sedan)
would match one or more digits followed by a letter d
followed by a space and a wagon
or sedan
; (?:...)
means a non-capturing groupUpvotes: 2