Reputation: 23
I am not very good with regex and it continues to confuse me every time it comes up so instead of writing a possibly incorrect regex string, I want to split a string a different way.
Let's say I have a string "hello, my name is Joseph! Haha, hello!" and I want to split it whenever I encounter a non-alphanumeric character. So then, in this case, I would obtain:
"hello" "my" "name" "is" "Joseph" "Haha" "hello"
Is there a way to do this without a regex string? As in: split whenever character != alphanumeric?
(Yes, I do realize it is probably not a smart thing to do to not correct my regex deficiency!)
Upvotes: 1
Views: 60
Reputation: 1004
I'm always glad when someone tries to avoid using regex ;) But here it's probably the best tool for the job.
You can write your own parser, but that is much more verbose:
s = "hello, my name is Joseph! Haha, hello!"
words = []
lasti = 0
lastp = False
for i,p in enumerate (not c.isalpha() for c in s):
if p != lastp:
if p: words.append (s[lasti:i])
lasti, lastp = i,p
print (', '.join (words))
Upvotes: 0
Reputation: 626903
Personally, I think it is appropriate to use simple and straightforward regexes for such simple tasks.
Compare an itertools and re solutions:
import itertools, re
s = "hello, my name is Joseph! Haha, hello!"
print(["".join(x) for _, x in itertools.groupby(s, key=str.isalnum)][0::2])
print(re.findall(r"\w+", s))
See an online Python demo here.
As for me, I'd vote for the regex here. The \w+
matches one or more word characters (letters, digits, underscores) and the re.findall
returns all the non-overlapping occurrences.
The itertools groupby
groups the substring chunks according to the key
which is set to alphanumeric (str.alnum
) and all the even tokens (the non-word chunks in this concrete case) are removed from the final result with [0::2]
. If a string starts with a non-word char, this won't work, a regex solution is safer and easier.
Upvotes: 1