Reputation: 125
I am trying to clean up a log and I want to remove some special strings
Example:
%/h > %/h Current value over threshold value
Pg/S > Pg/S Current value over threshold value
Pg/S > Pg/S No. of pages paged in exceeds threshold
MB < MB min. avg. value over threshold value
I have tried to use some patterns but it seems not to work.
re.sub(r'\w\w\/\s>\s\w','',text)
Is there any good idea for me to remove the special pattern?
I want to remove the .../...>.../...
I expect my output to only contain useful words.
Current value over threshold value
No. of pages paged in exceeds threshold
min. avg. value over threshold value
Thank you for any idea!
Upvotes: 1
Views: 150
Reputation: 6098
This is a relatively long regex, but it gets the job done.
[%\w][\/\w]\/?[\/\s\w]\s?\<?\>?\s\s[\w%]\/?[a-zA-Z%]\/?[\w]?\s\s?\s?
Demo: https://regex101.com/r/ayh19b/4
Or you can do something like:
^[\s\S]*?(?=\w\w(?:\w|\.))
Demo: https://regex101.com/r/ayh19b/6
Upvotes: 1
Reputation: 120768
Assuming the structure of the file is:
[special-string] [< or >] [special-string] [message]
then this should work:
>>> rgx = re.compile(r'^[^<>]+[<>] +\S+ +', re.M)
>>>
>>> s = """
... %/h > %/h Current value over threshold value
... Pg/S > Pg/S Current value over threshold value
... Pg/S > Pg/S No. of pages paged in exceeds threshold
... MB < MB min. avg. value over threshold value
... """
>>>
>>> print(rgx.sub('', s))
Current value over threshold value
Current value over threshold value
No. of pages paged in exceeds threshold
min. avg. value over threshold value
Upvotes: 3
Reputation: 26600
Based on the pattern you are trying to match on, it seems like you always know where the string is positioned. You can actually do this without regex, and just make use of split
and slicing to get the section of interest. Finally, use join
to bring back in to a string, for your final result.
The below result will do the following:
s.split()
- split on space creating a list where each words will be an entry in the list
[3:]
- slice the list by taking everything from the fourth position (0 indexing)
' '.join()
- Will convert back to a string, placing a space between each element from the list
Demo:
s = "%/h > %/h Current value over threshold value"
res = ' '.join(s.split()[3:])
Output:
Current value over threshold value
Upvotes: 3