hello_w
hello_w

Reputation: 37

Extract substrings with regular expression

Let's say I have a string:

L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!

And I need to extract the name - BIANCA and the text that is at the end into two variables. I tried to do somthen like this:

dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
name : str = ""
line : str = ""
name = re.findall('^L.*\s(.+?)\s.*', dialogue)

but I'm a little confused about using regular expression. How can I solve this using regular expression?

Thanks!

Upvotes: 0

Views: 421

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

You could match L at the start of the string, and use a quantifier {n} to set the number of occurrences to match +++$+++ followed by non whitespace characters.

^L\S*(?: \+{3}\$\+{3} \S+){2} \+{3}\$\+{3} (\S+) \+{3}\$\+{3} (.+)$

The pattern matches:

  • ^ Start of string
  • L\S* Match L followed by optional non whitespace chars
  • (?: \+{3}\$\+{3} \S+){2} Using a quantifier, repeat 2 times matching the delimiter followed by 1+ non whitespace chars
  • \+{3}\$\+{3} Match the delimiter
  • (\S+) Capture group 1, match 1+ non whitespace chars to match BIANCA
  • \+{3}\$\+{3} Match the delimiter
  • (.+) Capture group 2, match 1+ times any char except a newline to match They do not!
  • $ End of string

Regex demo

Upvotes: 1

dawg
dawg

Reputation: 103844

You can use this regex:

[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$

Demo

Python:

import re

dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"

>>> re.findall(r'[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$', dialogue)
[('BIANCA', 'They do not!')]

You can also split and slice:

>>> re.split(r'[ \t]\+{3}\$\+{3}[ \t]', dialogue)[-2:]
['BIANCA', ' They do not!']

But split and slice does not gracefully fail if +++$+++ is not found; the search pattern above does.

Upvotes: 1

balderman
balderman

Reputation: 23815

You can do that without re

data = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
parts = data.split('+++$+++')
print(parts[-2].strip())
print(parts[-1].strip())

output

BIANCA
They do not!

Upvotes: 1

Related Questions