Reputation: 157
I have the following string:
s = "<X> First <Y> Second"
and I can match any text right after <X>
and <Y>
(in this case "First" and "Second"). This is how I already did it:
import re
s = "<X> First <Y> Second"
pattern = r'\<([XxYy])\>([^\<]+)' # lower and upper case X/Y will be matched
items = re.findall(pattern, s)
print items
>>> [('X', ' First '), ('Y', ' Second')]
What I am now trying to match is the case without <>
:
s = "X First Y Second"
I tried this:
pattern = r'([XxYy]) ([^\<]+)'
>>> [('X', ' First Y Second')]
Unfortunately it's not producing the right result. What am I doing wrong? I want to match X or x or Y or y PLUS one whitespace (for instance "X "). How can I do that?
EDIT: this is a possible string too:
s = "<X> First one <Y> Second <X> More <Y> Text"
Output should be:
>>> [('X', ' First one '), ('Y', ' Second '), ('X', ' More '), ('Y', ' Text')]
EDIT2:
pattern = r'([XxYy]) ([^ ]+)'
s = "X First text Y Second"
produces:
[('X', 'First'), ('Y', 'Second')]
but it should be:
[('X', 'First text'), ('Y', 'Second')]
Upvotes: 1
Views: 143
Reputation: 157
So i came up with this solution:
pattern = r"([XxYy]) (.*?)(?= [XxYy] |$)"
Upvotes: 0
Reputation: 17956
How about something like: <?[XY]>? ([^<>XY$ ]+)
Example in javascript:
const re = /<?[XY]>? ([^<>XY$ ]+)/ig
console.info('<X> First <Y> Second'.match(re))
console.info('X First Y Second'.match(re))
Upvotes: 2
Reputation: 122
Assuming that a the whitespace token to match is a single space character, the pattern is:
pattern = r'([XxYy]) ([^ ]+)'
Upvotes: 1
Reputation: 1130
If you know which whitespace char to match, you can just add it to your expression. If you want any whitespace to match, you can use \s
pattern = r'\<([XxYy])\>([^\<]+)'
would then be
pattern = r'\<([XxYy])\>\s([^\<]+)'
Always keep in mind the the expression within the () is what will be returned as your result.
Upvotes: 1