Reputation: 37
Let's say I have a string:
L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!
And I need to extract the name - BIANCA and the text that is at the end into two variables. I tried to do somthen like this:
dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
name : str = ""
line : str = ""
name = re.findall('^L.*\s(.+?)\s.*', dialogue)
but I'm a little confused about using regular expression. How can I solve this using regular expression?
Thanks!
Upvotes: 0
Views: 421
Reputation: 163362
You could match L
at the start of the string, and use a quantifier {n}
to set the number of occurrences to match +++$+++
followed by non whitespace characters.
^L\S*(?: \+{3}\$\+{3} \S+){2} \+{3}\$\+{3} (\S+) \+{3}\$\+{3} (.+)$
The pattern matches:
^
Start of stringL\S*
Match L
followed by optional non whitespace chars(?: \+{3}\$\+{3} \S+){2}
Using a quantifier, repeat 2 times matching the delimiter followed by 1+ non whitespace chars\+{3}\$\+{3}
Match the delimiter(\S+)
Capture group 1, match 1+ non whitespace chars to match BIANCA
\+{3}\$\+{3}
Match the delimiter(.+)
Capture group 2, match 1+ times any char except a newline to match They do not!
$
End of stringUpvotes: 1
Reputation: 103844
You can use this regex:
[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$
Python:
import re
dialogue = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
>>> re.findall(r'[ \t]([^+]+)[ \t]\+{3}\$\+{3}[ \t]+([^+]+)$', dialogue)
[('BIANCA', 'They do not!')]
You can also split and slice:
>>> re.split(r'[ \t]\+{3}\$\+{3}[ \t]', dialogue)[-2:]
['BIANCA', ' They do not!']
But split and slice does not gracefully fail if +++$+++
is not found; the search pattern above does.
Upvotes: 1
Reputation: 23815
You can do that without re
data = "L1045 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ They do not!"
parts = data.split('+++$+++')
print(parts[-2].strip())
print(parts[-1].strip())
output
BIANCA
They do not!
Upvotes: 1