Reputation: 1
The sample below is to strip punctuations and converting text into lower case from a ranbo.txt file...
Help me to split this with whitespace
infile = open('ranbo.txt', 'r')
lowercased = infile.read().lower()
for c in string.punctuation:
lowercased = lowercased.replace(c,"")
white_space_words = lowercased.split(?????????)
print white_space_words
Now after this split - how can I found how many words are in this list?
count or len function?
Upvotes: 0
Views: 1780
Reputation: 212955
white_space_words = lowercased.split()
splits using any length of whitespace characters.
'a b \t cd\n ef'.split()
returns
['a', 'b', 'cd', 'ef']
But you could do it also other way round:
import re
words = re.findall(r'\w+', text)
returns a list of all "words" from text
.
Get its length using len()
:
len(words)
and if you want to join them into a new string with newlines:
text = '\n'.join(words)
As a whole:
with open('ranbo.txt', 'r') as f:
lowercased = f.read().lower()
words = re.findall(r'\w+', lowercased)
number_of_words = len(words)
text = '\n'.join(words)
Upvotes: 1