Remove non-alphabet (preferably using lambda func or something else short but not for-loop)

Question

I have made the following code and basically it outputs how often all characters showed up in a file named 'Test'.

from os import strerror
from collections import Counter

try:
    with open ('Test', 'rt') as handle:
        content = handle.read().lower().replace(' ', '').replace('
', '')
        counts = Counter(content)
    for i in sorted(counts, key=lambda x: counts[x], reverse=True)[:30]:
        print('{} -> {}'.format(i, counts[i]))
    
except IOError as e:
    print('I/O error occurred: ', strerror(e.errno))

The output is:

e -> 383
o -> 247
s -> 226
t -> 224
n -> 219
a -> 217
r -> 201
i -> 188
d -> 127
h -> 125
l -> 112
c -> 112
m -> 105
u -> 72
f -> 59
p -> 59
g -> 58
y -> 48
b -> 47
. -> 36
w -> 35
, -> 35
v -> 28
k -> 25
0 -> 15
- -> 9
% -> 8
1 -> 7
’ -> 7
x -> 7

Afterward I realized I just need the alphabets. I figured I have to modify line #6:

content = handle.read().lower().replace(' ', '').replace('
', '')

I am aware I could just create a for-loop and using following conditional expresstion: str.isalpha() to remove non-alphabetic.

I wonder if there's other better ways to do that.

Thank you in advance for your feedback:-)

user2390182 · Accepted Answer

You can do it all in one go, using a generator expression or filter:

counts = Counter(filter(str.isalpha, handle.read().lower()))

Btw, you should also consider using Counter.most_common for your output:

for k, n in counts.most_common(30):
    print('{} -> {}'.format(k, n))

Remove non-alphabet (preferably using lambda func or something else short but not for-loop)

Answers (2)

Related Questions