Reputation: 490
I have a column containing strings that are comprised of different words but always have a similar structure structure. E.g.:
2cm off ORDER AGAIN (191 1141)
I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis. So in this example I want to extract ORDER AGAIN.
Is this possible?
Upvotes: 0
Views: 1039
Reputation: 1
You can try the following code
s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN
Upvotes: 0
Reputation: 1856
If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:
import re
data = "2cm off ORDER AGAIN (191 1141)"
extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)
if extr:
print (extr.group(1))
Upvotes: 0
Reputation: 521289
You could use str.extract
here:
df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')
Note that this answer is robust even if the string doesn't have a (...)
term at the end.
Here is a demo showing that the regex logic is working.
Upvotes: 1
Reputation: 9197
You can try the following:
r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'
Upvotes: 1