Zars
Zars

Reputation: 41

Text between pattern match RegEx Python

I need some help with the following pattern, I am struggling many hours now. I have a text like:

<<12/24/2015 00:00  userrrr>>
********** Text all char and symbols ************
<<12/24/2015 00:00 CET userr>>
Text all char and symbols
<<12/24/2015 00:00 GMT+1 userrrr>> Text in same line
<<12/24/2015 00:00 CET userrr>>
Text all characters and symbols
<<12/24/2015 00:00 GMT+1 userrrrrrr>> Text in same line
More Text all characters and symbols
<<12/24/2015 00:00 CET userrrrr>>
More text all characters and symbols
<<12/24/2015 00:00 CET userrrrrrrrrrr>>
More Text all characters and symbols

By Using the pattern:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)

The datetime and everything between the arrows is matched correctly.Unfortunately, I can not find a way to extract the text between the patterns.The final groups should look like (left_arrows), (datetime), (user), (right_arrows), (text).The closer I got was by using:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}))

But it doesn't match the first and the last correctly.Click Here to check the result(pythex.org)

Upvotes: 4

Views: 105

Answers (2)

soungalo
soungalo

Reputation: 1338

I think the easiest way will be to go over the file line by line and try to match them with different regexes, one for header lines and one for text lines. But if you really need to get it in one shot, you could do:

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)\n\*+([^\*]+)\*+\n

Upvotes: 0

vks
vks

Reputation: 67968

(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{0,3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}|$))
                                                                                                ^^

You need to give |$ for the last line to match.See demo.

https://regex101.com/r/fM9lY3/51

Upvotes: 1

Related Questions