Yamane Imad
Yamane Imad

Reputation: 43

Extract from text with python and regex

Let's say we have text within which some quotes are stored in the form:

user:quote

we can have multiple quotes within a text.

Agatha Drake: She records her videos from the future? What is she, a
  f**ing time lord? Is she Michael J. Fox?

Harvey Spencer: This is just like that one movie where that one guy
  changed one tiny, little thing in his childhood to stop the girl of
  his dreams from being a crackhead in the future!

How can i extract the quotes (She records her videos from ..., This is just like that one movie....) from the text in python?

I tried

re.findall('\S\:\s?(.*)', text)

But it's not doing the job.

https://regex101.com/r/vH63Go/1

How can I do it in Python?

Upvotes: 1

Views: 65

Answers (1)

Sebastian Proske
Sebastian Proske

Reputation: 8413

If your string is following the consistent format of user at the start of a line and double newlines ending a quote, you could use this:

(?m)^[^:\n]+:\s?((?:.+\n?)*)

It uses multiline mode and matches the start of a line, followed by characters that are neither : nor newline, folllowed by :. Then captures all following lines with content.

Here's a demo on regex101.

Upvotes: 1

Related Questions