Reputation: 161
I'm fairly new to Python. I would like to know the best way to extract a substring after a certain pattern. The pattern is the following Prefix - Postfix
. I would like to isolate the Postfix. I can guarantee that the Prefix will only contain letters, but I cannot guarantee its length. On the other hand, the Postfix may have spaces and hyphens within it; it can be any character whatsoever. I simply need to get rid of the Prefix -
and keep the 'Postfix'
"""
Example input:
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
Desired result:
RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM
"""
What would be the best way to achieve this? I have the following code, but it doesn't quite to do what I want it to:
import sqlalchemy
url = 'mysql://scott:tiger@localhost/test'
engine = create_engine(url)
db = engine.connect()
# Construct Query
query = "SELECT name FROM items"
# Obtain table information
item_list = db.execute(query)
# Declare list that will hold the results
result_list = []
for item in item_list:
result_list.append(item[0].rsplit('-', 1)[1].strip())
return result_list
Would you recommend I use regex ? Or is there a better way? Any advice or help is appreciated.
Thank you
Upvotes: 1
Views: 4347
Reputation: 256
This was the best(shortest) regex I could come up with that returned what you wanted. This answer hopefully deals with all the edge cases (etc. having dashes in your desired string). However, there are some spacing issues.
import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- ")
a = re.sub(reg,"\n",the_str)
print(a)
returns:
RVA-QA PK
VA - BN146
STP_NA
ZXU RMP LM
The spacing is weird(due to multiline strings), but you could just .strip("\n") it away. A second regex would be
import re
the_str = """
Intern - RVA-QA PK
Fulltime - VA - BN146
Intern - STP_NA
Intern - ZXU RMP LM
"""
reg = re.compile("\n.*?- (.*)")
a = re.findall(reg,the_str)
print(a)
This returns an array of all the correct answers, without any spacing issues.
Output:
['RVA-QA PK', 'VA - BN146', 'STP_NA', 'ZXU RMP LM']
Hope this helped!
Upvotes: 2
Reputation: 166
I don't think you need to use regex since you simply want to extract the substring after the first appearance of a specific sequence of characters.
String.index()
method returns the index of a substring inside the string (the first one, if there are more than one), so use this to find the location of the separator. You can easily extract the postfix with string slicing afterward.
The code below should print Postfix
.
item = 'Prefix - Postfix'
separator = ' - '
start = item.index(separator) + len(separator)
print(item[start:])
Try this with your examples. https://www.pythonpad.co/pads/edtnyn2hk6u4ns8h/
Upvotes: 2
Reputation: 161
The correction solution seems to be the following:
for item in item_list:
result_list.append(item[0].split(' - ', 1)[1].strip())
Thanks for all the answers.
Upvotes: 1
Reputation: 322
If you want to replace anything before "-"
just try:
import re
str = "example - postfix"
re.sub(".+-", "", str)
output:
"postfix"
I am using regex here. You can also use str.split("-")[1]
Upvotes: 2
Reputation: 162
You can use python split and strip function.
Split() returns an array of chunks.
For example,
m_string = "I-have-got-an-example" result1 = m_string.split('-')
'result1' is ['I', 'have', 'got', 'an', 'example']
Only for using this one, you will have whitespaces, so you have to use strip() as well.
You can try this example. `m_string = "I - have - got- an -example" result = [x.strip() for x in m_string.split('-')]
` I hope this will be helpful for you.
Upvotes: 1