Reputation: 39
How to use python regular expression to extract data from the below two string
TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress
TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed
I need the following csv file from this:
Format: TaskID,Priority,Status
TASK000123,P1,In Progress
TASK000123,P2,Completed
How can I do this? Thanks for helping me out
Upvotes: 1
Views: 84
Reputation: 521053
Here in an option using re.findall
:
input = "TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress\nTASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed"
results = re.findall(r"(TASK\d+).*?-(P\d+) --(.*)(?=\n|$)", input)
print(results)
[('TASK000123', 'P1', 'In Progress'), ('TASK000123', 'P1', 'Completed')]
Note that DOT ALL
mode should not be necessary here, because we never need .*
to match across newlines. Also, the above seems to work without using MULTILINE
mode as well.
Upvotes: 2
Reputation: 82765
This is one approach using a simple iteration.
Ex:
s = """TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress
TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed"""
result = [["TaskID","Priority","Status"]]
for i in s.splitlines():
val = i.split("-") #Split by '-'
result.append([val[0], val[2], val[-1]])
print(result)
Output:
[['TaskID', 'Priority', 'Status'],
['TASK000123', 'P1 ', 'In Progress'],
['TASK000123', 'P1 ', 'Completed']]
Upvotes: 2