Reputation: 1460
I have the following string:
txt='agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
This is the delimiter:
delimiters = " \t,;.?!-:@[](){}_*/"
As output, I want this list of values:
"agadsfa","2asdf","sdfsaf","asfsadf","adsf","klnalfk","jn234kmafs","adfs","nlnawr23"
I tried using regex:
re.split(delimiters,txt)
But I'm getting this error:
re.error: unterminated character set at position 10
What is wrong here?
Upvotes: 1
Views: 405
Reputation: 31
try this:
import re
txt = "agadsfa_(2asdf_sdfsaf)asfs?adf[adsf_klna!lfk;jn234kmafs)adfs, nlnawr*23"
line = re.sub(
r"[ \t,;\.?!\-:@\[\](){}_*/]+",
r",",
txt
)
print(line.split(","))
Upvotes: 0
Reputation: 235984
Your regular expression is incorrect. And from the comments, you've added the requirement that the delimiters
string is not to be touched.
What we need to do then, is to process the input string and convert it into a proper regex that can be used by split()
. Here's how:
# need to enclose regex in [], we want to split on any of
# the chars; also some of the chars need to be escaped
delimiters = ' \t,;.?!-:@[](){}_*/'
regex = delimiters.replace(']', '\]').replace('-', '\-')
regex = r'[{}]+'.format(regex)
The result is as expected:
txt = 'agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
re.split(regex, txt)
=> ['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
Upvotes: 2
Reputation: 3265
You have to split your delimiters using |
:
delimiters = r' |\t|,|;|\.|\?|!|-|:|@|\[|\]|\(|\)|\{|\}|_|\*|/'
# then use this to eliminate empty strings if you have two delimiters next to each other
print([w for w in re.split(delimiters,txt) if w])
# or list(filter(lambda a: a, re.split(delimiters,txt)))
result is:
['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
Upvotes: 0
Reputation: 815
Python 3 code
import re
txt="agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23"
delimiters = "_|;|,|\)|\(|\[|\]"
list(filter(None, re.split(delimiters, txt)))
Output
['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']
Separate your symbols by | and use pythons list filter function to avoid empty strings
Upvotes: 0