python re search syntax to identify text pattern spread over multiple lines

Question

I recently started working with python and regular expression. As a first project, I want to read out a pdf file, filter specific text data and recombine in a excel sheet. Hence I ran in a regex problem:

pdf file output format:

...

The text of this line is not always here

The community is here to help you with specific coding, algorithm, or language problems.

Summarize

ask

The text of this line is not always here

...

I want to search for "ask" and find it by "specific coding" and " Summarize ". The text below "ask" can not be used to find it reliably, because it is always different.

I tried to use (?=...) and (?<=...) for this but I couldn't figure out a suitable solution.

Maybe I'm doing something wrong. Has someone an idea?

The fourth bird · Accepted Answer

If you want to find ask you could use a capturing group instead of lookarounds. You could match specific coding followed by the rest of the line .*

If there are empty lines and newlines after it, you could use \s* to match those.

Then match a newline followed by Summarize.

Match the empty lines and a newline again and and capture ask in a capturing group

\bspecific coding\b.*\s*
?
Summarize\s*
?
(ask)\b

Regex demo

python re search syntax to identify text pattern spread over multiple lines

Answers (1)

Related Questions