Reputation: 2273

Regex: write the pattern to split with a comma

I need to create a tokenizer that will split a string with commas.

It's possible to do that with split using

re.split(',+', str)

But I need to use compile. I tried

text = "5g, dynamic vision sensor (dvs), 3-d reconstruction, neuromorphic engineering, neural networks, humanoid robots, neuromorphics, closed loop systems, field programmable gate arrays, spiking motor controller, neuromorphic implementation, icub, relation neural network"
pattern = re.compile(r'[a-z0-9\(\)-]+')
re.findall(pattern, text)

And output is

['5g', 'dynamic', 'vision', 'sensor', '(dvs)', '3-d', 'reconstruction', 'neuromorphic', 'engineering', 'neural', 'networks', 'humanoid', 'robots', 'neuromorphics', 'closed', 'loop', 'systems', 'field', 'programmable', 'gate', 'arrays', 'spiking', 'motor', 'controller', 'neuromorphic', 'implementation', 'icub', 'relation', 'neural', 'network']

Desired output is

['5g', 'dynamic vision sensor (dvs)', '3-d reconstruction', 'neuromorphic engineering', 'neural networks', 'humanoid robots', 'neuromorphics', 'closed loop systems', 'field programmable gate arrays', 'spiking motor controller', 'neuromorphic implementation', 'icub', 'relation neural network']

Upvotes: 0

Answers (3)

mama

Reputation: 2227

Don't use regex for this. Just use python's build-in split() function

text = "5g, dynamic vision sensor (dvs), 3-d reconstruction, neuromorphic engineering, neural networks, humanoid robots, neuromorphics, closed loop systems, field programmable gate arrays, spiking motor controller, neuromorphic implementation, icub, relation neural network"

print(text.split(', '))

Upvotes: 1

Alireza

Reputation: 2123

Try this pattern: [a-z0-9() -]+(?=,|$)

Code:

text = "5g, dynamic vision sensor (dvs), 3-d reconstruction, neuromorphic engineering, neural networks, humanoid robots, neuromorphics, closed loop systems, field programmable gate arrays, spiking motor controller, neuromorphic implementation, icub, relation neural network"
pattern = re.compile(r'[a-z0-9() -]+(?=,|$)')
print([x.strip() for x in re.findall(pattern, text)])

Output:

['5g', 'dynamic vision sensor (dvs)', '3-d reconstruction', 'neuromorphic engineering', 'neural networks', 'humanoid robots', 'neuromorphics', 'closed loop systems', 'field programmable gate arrays', 'spiking motor controller', 'neuromorphic implementation', 'icub', 'relation neural network']

Upvotes: 1

Hakan Akgün

Reputation: 927

As @mama said you don't need to use regex for this but if you especially want to use re.compile you can do that with the following code:

text = "5g, dynamic vision sensor (dvs), 3-d reconstruction, neuromorphic engineering, neural networks, humanoid robots, neuromorphics, closed loop systems, field programmable gate arrays, spiking motor controller, neuromorphic implementation, icub, relation neural network"
pattern = re.compile(r'([\sa-z0-9\(\)-]+)')
L=re.findall(pattern, text)
L=[l.lstrip(" ") for l in L]

Upvotes: 1

Regex: write the pattern to split with a comma

Answers (3)

Related Questions