Reputation: 785
I am trying to remove special characters from each element in a string. The below code does count the elements but i can't get the .isalpha to remove the non alphabetical elements. Is anyone able to assist? Thank you in advance.
input = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = input.split()
for word in word_list:
if word.isalpha()==False:
word[:-1]
di = dict()
for word in word_list:
di[word] = di.get(word,0)+1
di
Upvotes: 2
Views: 3729
Reputation: 2882
One solution using re:
In [1]: import re
In [2]: a = 'Hello, Goodbye hello hello! bye byebye hello?'
In [3]: ' '.join([i for i in re.split(r'[^A-Za-z]', a) if i])
Out[3]: 'Hello Goodbye hello hello bye byebye hello'
Upvotes: 1
Reputation: 164773
You are nearly there with your for
loop. The main stumbling block seems to be that word[:-1]
on its own does nothing, you need to store that data somewhere. For example, by appending to a list.
You also need to specify what happens to strings which don't need modifying. I'm also not sure what purpose the dictionary serves.
So here's your for
loop re-written:
mystring = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = mystring.split()
res = []
for word in word_list:
if not word.isalpha():
res.append(word[:-1])
else:
res.append(word)
mystring_out = ' '.join(res) # 'Hello Goodbye hello hello bye byebye hello'
The idiomatic way to write the above is via feeding a list comprehension to str.join
:
mystring_out = ' '.join([word[:-1] if not word.isalpha() else word \
for word in mystring.split()])
It goes without saying that this assumes word.isalpha()
returns False
due to an unwanted character at the end of a string, and that this is the only scenario you want to consider for special characters.
Upvotes: 1
Reputation: 25370
It seems you are expecting word[:-1]
to remove the last character of word and have that change reflected in the list word_list
. However, you have assigned the string in word_list
to a new variable called word and therefore the change won't be reflected in the list itself.
A simple fix would be to create a new list and append values into that. Note that your original string is called input
which shadows the builtin input()
function which is not a good idea:
input_string = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = input_string.split()
new = []
for word in word_list:
if word.isalpha() == False:
new.append(word[:-1])
else:
new.append(word)
di = dict()
for word in new:
di[word] = di.get(word,0)+1
print(di)
# {'byebye': 1, 'bye': 1, 'Hello': 1, 'Goodbye': 1, 'hello': 3}
You could also remove the second for loop and use collections.Counter
instead:
from collections import Counter
print(Counter(new))
Upvotes: 1