Abeer Sheikh
Abeer Sheikh

Reputation: 39

Regular expressions python data extraction

How to use python regular expression to extract data from the below two string

TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress

TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed

I need the following csv file from this:

Format: TaskID,Priority,Status

TASK000123,P1,In Progress

TASK000123,P2,Completed

How can I do this? Thanks for helping me out

Upvotes: 1

Views: 84

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521053

Here in an option using re.findall:

input = "TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress\nTASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed"
results = re.findall(r"(TASK\d+).*?-(P\d+) --(.*)(?=\n|$)", input)
print(results)

[('TASK000123', 'P1', 'In Progress'), ('TASK000123', 'P1', 'Completed')]

Note that DOT ALL mode should not be necessary here, because we never need .* to match across newlines. Also, the above seems to work without using MULTILINE mode as well.

Upvotes: 2

Rakesh
Rakesh

Reputation: 82765

This is one approach using a simple iteration.

Ex:

s = """TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --In Progress
TASK000123-Tomcat server hosted on tbu.test1 is down-P1 --Completed"""

result = [["TaskID","Priority","Status"]]

for i in s.splitlines():
    val = i.split("-")                          #Split by '-'
    result.append([val[0], val[2], val[-1]])
print(result)

Output:

[['TaskID', 'Priority', 'Status'],
 ['TASK000123', 'P1 ', 'In Progress'],
 ['TASK000123', 'P1 ', 'Completed']]

Upvotes: 2

Related Questions