Reputation: 125
I am trying to clean up some log and want to extract general information from the message. I am newie to python and just learn regular expression yesterday and now have problems.
My message look like this:
Report ZSIM_RANDOM_DURATION_ started
Report ZSIM_SYSTEM_ACTIVITY started
Report /BDL/TASK_SCHEDULER started
Report ZSIM_JOB_CREATE started
Report RSBTCRTE started
Report SAPMSSY started
Report RSRZLLG_ACTUAL started
Report RSRZLLG started
Report RGWMON_SEND_NILIST started
I try to some code:
clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)
but I think this code will remove all rows however I want to keep the format like Report .....Started. So I only want to remove the jobs name in the middle.
I expect my outcome looks like this:
Report started
Anyone can help me with a idea? Thank you very much!
Upvotes: 2
Views: 6425
Reputation: 3515
This should do... '^Report\ [^\ ]*\ started'
Regex is black magic, only use it when you have to. Online tools make it much easier to write: https://regex101.com/
Upvotes: 1
Reputation: 45
I don't know about the python syntax but I can sure this regexp can help you match your string
/^Report\W+([\w&.#@%^!~-]+)\W+started/m*
The python string might be like this
text = "Report ZSIM_RANDOM_DURATION_ started";
clean_special2=re.sub(r'^Report\W+([\w&.#@%^!~-]+)\W+started',' ',text)*
Upvotes: 1
Reputation: 60190
Try something like this:
clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)
Explanation: the (?<=...)
is a positive lookbehind, e.g. the string must match the content of this group, but it will not be captured and thus not replaced. Same thing on the other side with a positive look-ahead (?=...)
. The \b
is a word boundary, so that everything between these words will be matched. Since this will also trim away the whitespace, the replacement is a single whitespace.
Upvotes: 3