regex_struggler
regex_struggler

Reputation: 13

python re search syntax to identify text pattern spread over multiple lines

I recently started working with python and regular expression. As a first project, I want to read out a pdf file, filter specific text data and recombine in a excel sheet. Hence I ran in a regex problem:

pdf file output format:

...

The text of this line is not always here\n

The community is here to help you with specific coding, algorithm, or language problems.\n

Summarize\n

ask\n

The text of this line is not always here\n

...

I want to search for "ask" and find it by "specific coding" and "\nSummarize\n". The text below "ask" can not be used to find it reliably, because it is always different.

I tried to use (?=...) and (?<=...) for this but I couldn't figure out a suitable solution.

Maybe I'm doing something wrong. Has someone an idea?

Upvotes: 1

Views: 54

Answers (1)

The fourth bird
The fourth bird

Reputation: 163447

If you want to find ask you could use a capturing group instead of lookarounds. You could match specific coding followed by the rest of the line .*

If there are empty lines and newlines after it, you could use \s* to match those.

Then match a newline followed by Summarize.

Match the empty lines and a newline again and and capture ask in a capturing group

\bspecific coding\b.*\s*\r?\nSummarize\s*\r?\n(ask)\b

Regex demo

Upvotes: 1

Related Questions