Reputation: 6197
I am trying to split a string by |INDEX|
and /
.
re.split
can handle multiple separator and uses pipes to separate each separator, so they need to be escaped.
I tried separating with:
a = 'Tokenized/0003036v1|INDEX|3847.story.json'
re.split( r"/|\|INDEX|\|" , a)
However, this resulted in an extra, empty split:
['Tokenized', '0003036v1', '', '3847.story.json']
Why are there 4 items in the list with an empty item, instead of three?
Upvotes: 1
Views: 30
Reputation: 147206
You have an error in your regex, with an extra |
before the closing \|
for |INDEX|
, so the string is being split on |INDEX
and |
, resulting in the empty string between them. Change the regex to this:
re.split( r"/|\|INDEX\|" , a)
Upvotes: 1
Reputation: 1392
instead of
re.split( r"/|\|INDEX|\|" , a)
use this
re.split( r"/|\|INDEX\|" , a)
# splitting based on maxsplit argument to know where the problem is present
>>> re.split( r"/|\|INDEX|\|" , a,1)
['Tokenized', '0003036v1|INDEX|3847.story.json']
>>> re.split( r"/|\|INDEX|\|" , a,2)
['Tokenized', '0003036v1', '|3847.story.json']
>>> re.split( r"/|\|INDEX|\|" , a,3)
['Tokenized', '0003036v1', '', '3847.story.json']
>>> re.split( r"/|\|INDEX\|" , a)
['Tokenized', '0003036v1', '3847.story.json']
Upvotes: 1