avinashse
avinashse

Reputation: 1460

Split string with regex delimiter in python

I have the following string:

txt='agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'

This is the delimiter:

delimiters = " \t,;.?!-:@[](){}_*/"

As output, I want this list of values:

"agadsfa","2asdf","sdfsaf","asfsadf","adsf","klnalfk","jn234kmafs","adfs","nlnawr23"

I tried using regex:

re.split(delimiters,txt)

But I'm getting this error:

re.error: unterminated character set at position 10

What is wrong here?

Upvotes: 1

Views: 405

Answers (4)

Sebastian Lusenti
Sebastian Lusenti

Reputation: 31

try this:

import re

txt = "agadsfa_(2asdf_sdfsaf)asfs?adf[adsf_klna!lfk;jn234kmafs)adfs, nlnawr*23"

line = re.sub(
           r"[ \t,;\.?!\-:@\[\](){}_*/]+", 
           r",", 
           txt
       )

print(line.split(","))

Upvotes: 0

Óscar López
Óscar López

Reputation: 235984

Your regular expression is incorrect. And from the comments, you've added the requirement that the delimiters string is not to be touched.

What we need to do then, is to process the input string and convert it into a proper regex that can be used by split(). Here's how:

# need to enclose regex in [], we want to split on any of
# the chars; also some of the chars need to be escaped    
delimiters = ' \t,;.?!-:@[](){}_*/'
regex = delimiters.replace(']', '\]').replace('-', '\-')
regex = r'[{}]+'.format(regex)

The result is as expected:

txt = 'agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23'
re.split(regex, txt)
=> ['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

Upvotes: 2

Benoît P
Benoît P

Reputation: 3265

You have to split your delimiters using |:

delimiters = r' |\t|,|;|\.|\?|!|-|:|@|\[|\]|\(|\)|\{|\}|_|\*|/'
# then use this to eliminate empty strings if you have two delimiters next to each other
print([w for w in re.split(delimiters,txt) if w])   
# or list(filter(lambda a: a, re.split(delimiters,txt)))

result is:

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

Upvotes: 0

Akash Kinwad
Akash Kinwad

Reputation: 815

Python 3 code

import re

txt="agadsfa_(2asdf_sdfsaf)asfsadf[adsf_klnalfk;jn234kmafs)adfs,nlnawr23"

delimiters = "_|;|,|\)|\(|\[|\]"

list(filter(None, re.split(delimiters, txt)))

Output

['agadsfa', '2asdf', 'sdfsaf', 'asfsadf', 'adsf', 'klnalfk', 'jn234kmafs', 'adfs', 'nlnawr23']

Separate your symbols by | and use pythons list filter function to avoid empty strings

Upvotes: 0

Related Questions