Reputation: 41
I need some help with the following pattern, I am struggling many hours now. I have a text like:
<<12/24/2015 00:00 userrrr>>
********** Text all char and symbols ************
<<12/24/2015 00:00 CET userr>>
Text all char and symbols
<<12/24/2015 00:00 GMT+1 userrrr>> Text in same line
<<12/24/2015 00:00 CET userrr>>
Text all characters and symbols
<<12/24/2015 00:00 GMT+1 userrrrrrr>> Text in same line
More Text all characters and symbols
<<12/24/2015 00:00 CET userrrrr>>
More text all characters and symbols
<<12/24/2015 00:00 CET userrrrrrrrrrr>>
More Text all characters and symbols
By Using the pattern:
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)
The datetime and everything between the arrows is matched correctly.Unfortunately, I can not find a way to extract the text between the patterns.The final groups should look like (left_arrows), (datetime), (user), (right_arrows), (text).The closer I got was by using:
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}))
But it doesn't match the first and the last correctly.Click Here to check the result(pythex.org)
Upvotes: 4
Views: 105
Reputation: 1338
I think the easiest way will be to go over the file line by line and try to match them with different regexes, one for header lines and one for text lines. But if you really need to get it in one shot, you could do:
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2})(.*?(?=>>))(>>)\n\*+([^\*]+)\*+\n
Upvotes: 0
Reputation: 67968
(\<<)(\d{2}/\d{2}/\d{4}\s\d{2}:\d{2}\s\D{0,3}.*?(?=\s))\s(.*?(?=>>))(>>)((?s).*?(?=<<\d{2}/\d{2}|$))
^^
You need to give |$
for the last line to match.See demo.
https://regex101.com/r/fM9lY3/51
Upvotes: 1