How to retrieve wanted string with re from lines

Question

 Tue Aug 21 17:02:26 2018 (gtgrhrthrhrhrthhhthrthrhrh)
 fjfpjpgporejpejgjr[eh[[[jh[j[ej[[ej[ej[e]]]]
 fkw[kgkeg[ekrk[ekg[kergk[erkg[eg[kg]
 Tue Aug 21 17:31:06 2018 ( ijwejfwfjwpfjwf[[few[jjfwfefwfeffeww]]
 fiowhfiweohewhfpwfhpfhpepwehfphpwhfpehfpwfh
 f,wfpewfefewgpwpg,pewgp
 Tue Aug 21 18:10:42 2018 ( reijpjfpjejferjfrejfpjefjer
 k[pfk[epkf[kr[ek[ke[gkk]
 r[g[keprkgpekg[rkg[pkg[ekg]

Above is an example of the content in the text file. I want to extract a string with re. How should I construct the findall condition to achieve the expected result below? I have tried the following:

  match=re.findall(r'[Tue\w]+2018$',data2)

but it is not working. I understand that $ is the symbol for the end of the string. How can I do it?

Expected Result is:

  Tue Aug 21 17:02:26 2018
  Tue Aug 21 17:31:06 2018
  Tue Aug 21 18:10:42 2018
           .
           .
           .

Paolo · Accepted Answer

Use the pattern:

^Tue.*?2018

^ Assert position beginning of line.
Tue Literal substring.
.*? Match anything lazily.
2018 Match literal substring.

Since you are working with a multiline string and you want to match pattern at the beginning of a string, you have to use the re.MULTILINE flag.

import re
mystr="""
Tue Aug 21 17:02:26 2018 (gtgrhrthrhrhrthhhthrthrhrh)
fjfpjpgporejpejgjr[eh[[[jh[j[ej[[ej[ej[e]]]]
fkw[kgkeg[ekrk[ekg[kergk[erkg[eg[kg]
Tue Aug 21 17:31:06 2018 ( ijwejfwfjwpfjwf[[few[jjfwfefwfeffeww]]
fiowhfiweohewhfpwfhpfhpepwehfphpwhfpehfpwfh
f,wfpewfefewgpwpg,pewgp
Tue Aug 21 18:10:42 2018 ( reijpjfpjejferjfrejfpjefjer
k[pfk[epkf[kr[ek[ke[gkk]
r[g[keprkgpekg[rkg[pkg[ekg]
"""

print(re.findall(r'^Tue.*?2018',mystr,re.MULTILINE))

Prints:

['Tue Aug 21 17:02:26 2018', 'Tue Aug 21 17:31:06 2018', 'Tue Aug 21 18:10:42 2018']

How to retrieve wanted string with re from lines

Answers (1)

Related Questions