Vyacheslav Gorbov
Vyacheslav Gorbov

Reputation: 161

How do you extract substring after a pattern

I'm fairly new to Python. I would like to know the best way to extract a substring after a certain pattern. The pattern is the following Prefix - Postfix. I would like to isolate the Postfix. I can guarantee that the Prefix will only contain letters, but I cannot guarantee its length. On the other hand, the Postfix may have spaces and hyphens within it; it can be any character whatsoever. I simply need to get rid of the Prefix - and keep the 'Postfix'

"""
Example input:
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM

Desired result:
RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM
"""

What would be the best way to achieve this? I have the following code, but it doesn't quite to do what I want it to:

import sqlalchemy

url = 'mysql://scott:tiger@localhost/test'
engine = create_engine(url)
db = engine.connect()

# Construct Query
query = "SELECT name FROM items"

# Obtain table information
item_list = db.execute(query)

# Declare list that will hold the results
result_list = []

for item in item_list:
    result_list.append(item[0].rsplit('-', 1)[1].strip())

return result_list

Would you recommend I use regex ? Or is there a better way? Any advice or help is appreciated.

Thank you

Upvotes: 1

Views: 4347

Answers (5)

a1426
a1426

Reputation: 256

This was the best(shortest) regex I could come up with that returned what you wanted. This answer hopefully deals with all the edge cases (etc. having dashes in your desired string). However, there are some spacing issues.

import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- ")
a = re.sub(reg,"\n",the_str)

print(a)

returns:


RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM

The spacing is weird(due to multiline strings), but you could just .strip("\n") it away. A second regex would be

import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- (.*)")
a = re.findall(reg,the_str)
print(a)

This returns an array of all the correct answers, without any spacing issues. Output: ['RVA-QA PK', 'VA - BN146', 'STP_NA', 'ZXU RMP LM']

Hope this helped!

Upvotes: 2

Jeongmin
Jeongmin

Reputation: 166

I don't think you need to use regex since you simply want to extract the substring after the first appearance of a specific sequence of characters.

String.index() method returns the index of a substring inside the string (the first one, if there are more than one), so use this to find the location of the separator. You can easily extract the postfix with string slicing afterward.

The code below should print Postfix.

item = 'Prefix - Postfix'
separator = ' - '
start = item.index(separator) + len(separator)
print(item[start:])

Try this with your examples. https://www.pythonpad.co/pads/edtnyn2hk6u4ns8h/

Upvotes: 2

Vyacheslav Gorbov
Vyacheslav Gorbov

Reputation: 161

The correction solution seems to be the following:

for item in item_list:
    result_list.append(item[0].split(' - ', 1)[1].strip())

Thanks for all the answers.

Upvotes: 1

JaySabir
JaySabir

Reputation: 322

If you want to replace anything before "-"

just try:

import re
str = "example - postfix"
re.sub(".+-", "", str)

output:

"postfix"

I am using regex here. You can also use str.split("-")[1]

Upvotes: 2

Senior0023
Senior0023

Reputation: 162

You can use python split and strip function. Split() returns an array of chunks. For example, m_string = "I-have-got-an-example" result1 = m_string.split('-') 'result1' is ['I', 'have', 'got', 'an', 'example'] Only for using this one, you will have whitespaces, so you have to use strip() as well.

You can try this example. `m_string = "I - have - got- an -example" result = [x.strip() for x in m_string.split('-')]

result is ["I", "have", "got", "an", "example"]

` I hope this will be helpful for you.

Upvotes: 1

Related Questions