zihan meng
zihan meng

Reputation: 125

How to remove strings between two characters using regular expression python

I am trying to clean up some log and want to extract general information from the message. I am newie to python and just learn regular expression yesterday and now have problems.

My message look like this:

 Report ZSIM_RANDOM_DURATION_ started
 Report ZSIM_SYSTEM_ACTIVITY started
 Report /BDL/TASK_SCHEDULER started
 Report ZSIM_JOB_CREATE started
 Report RSBTCRTE started
 Report SAPMSSY started
 Report RSRZLLG_ACTUAL started
 Report RSRZLLG started
 Report RGWMON_SEND_NILIST started

I try to some code:

clean_special2=re.sub(r'^[Report] [^1-9] [started]','',text)

but I think this code will remove all rows however I want to keep the format like Report .....Started. So I only want to remove the jobs name in the middle.

I expect my outcome looks like this:

Report started

Anyone can help me with a idea? Thank you very much!

Upvotes: 2

Views: 6425

Answers (3)

Roger Heathcote
Roger Heathcote

Reputation: 3515

This should do... '^Report\ [^\ ]*\ started'

Regex is black magic, only use it when you have to. Online tools make it much easier to write: https://regex101.com/

Upvotes: 1

karthy periyasamy
karthy periyasamy

Reputation: 45

I don't know about the python syntax but I can sure this regexp can help you match your string

/^Report\W+([\w&.#@%^!~-]+)\W+started/m*

The python string might be like this
text = "Report ZSIM_RANDOM_DURATION_ started";

clean_special2=re.sub(r'^Report\W+([\w&.#@%^!~-]+)\W+started',' ',text)*

Upvotes: 1

Lucero
Lucero

Reputation: 60190

Try something like this:

clean_special2=re.sub(r'(?<=^Report\b).*(?=\bstarted)',' ',text)

Explanation: the (?<=...) is a positive lookbehind, e.g. the string must match the content of this group, but it will not be captured and thus not replaced. Same thing on the other side with a positive look-ahead (?=...). The \b is a word boundary, so that everything between these words will be matched. Since this will also trim away the whitespace, the replacement is a single whitespace.

Upvotes: 3

Related Questions