James
James

Reputation: 490

Extracting Sub-string Between Two Characters in String in Pandas Dataframe

I have a column containing strings that are comprised of different words but always have a similar structure structure. E.g.:

2cm off ORDER AGAIN (191 1141)

I want to extract the sub-string that starts after the second space and ends at the space before the opening bracket/parenthesis. So in this example I want to extract ORDER AGAIN.

Is this possible?

Upvotes: 0

Views: 1039

Answers (4)

Kunal Gautam
Kunal Gautam

Reputation: 1

You can try the following code

s = '2cm off ORDER AGAIN (191 1141)'
second_space = s.find(' ', s.find(' ') + 1)
openparenthesis = s.find('(')
substring = s[second_space : openparenthesis]
print(substring) #ORDER AGAIN

Upvotes: 0

Bhagyesh Dudhediya
Bhagyesh Dudhediya

Reputation: 1856

If the pattern of data is similar to what you have posted then I think the below code snippet should work for you:

import re
data = "2cm off ORDER AGAIN (191 1141)"

extr = re.match(r".*?\s.*?\s(.*)\s\(.*", data)       
if extr:
    print (extr.group(1))

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521289

You could use str.extract here:

df["out"] = df["col"].str.extract(r'^\w+ \w+ (.*?)(?: \(|$)')

Note that this answer is robust even if the string doesn't have a (...) term at the end.

Here is a demo showing that the regex logic is working.

Upvotes: 1

Andreas
Andreas

Reputation: 9197

You can try the following:

r"2cm off ORDER AGAIN (191 1141)".split(r"(")[0].split(" ", maxsplit=2)[-1].strip()
#Out[3]: 'ORDER AGAIN'

Upvotes: 1

Related Questions