Reputation: 375
I want to split the below mentioned string:
lin=' <abc<hd <> "abc\"d\" ef" '
into
[<abc<hd <>, "abc\"d\" ef"]
However my problem is when I split the string using re.findall(r'"(.*?)"', lin, 0)
. I get
['abc', 'ef']
Can someone please guide me as to how can I split the string in Python?
Upvotes: 0
Views: 1139
Reputation: 1872
Here is a solution using regular expression.
import re
line = ' <abc<hd <> "abc\"d\" ef" '
match = list(re.findall(r'(<[^>]+>)\s+("(?:\"|[^"])+")', line)[0])
print(match)
#['<abc<hd <>', '"abc"d" ef"']
Another way to do it.
print(re.split(r'\s+(?=")', line.strip())) #split on white space only if followed by a quote.
#['<abc<hd <>', '"abc"d" ef"']
Upvotes: 4
Reputation: 239653
Yet another RegEx solution
import re
lin=' <abc<hd <> "abc\"d\" ef" '
matching = re.match("\s+(.*?)\s+(\"(.*)\")", lin)
print [matching.group(1), matching.group(2)]
Output
['<abc<hd <>', '"abc"d" ef"']
Upvotes: 1
Reputation: 60024
Firstly, you have some extra whitespace on the beginning and end of your string, so doing lin .strip()
will remove that.
You can then use str.split()
to split at the first "
:
>>> lin.strip().split(' "', 1)
['<abc<hd <>', 'abc"d" ef"']
The 1
we use as a second argument tells python to only split it once, and so not split at every other "
.
Upvotes: 3
Reputation:
>>> lin=' <abc<hd <> "abc\"d\" ef" '
>>> lin.split('"', 1)
[' <abc<hd <> ', 'abc"d" ef" ']
Upvotes: 0