Reputation: 524
I am new to python and am trying to find the largest word in the alice_in_worderland.txt. I think I have a good system set up ("See Below"), but my output is returning a "word" with dashes connecting multiple words. Is there someway to remove the dashes in the input of the file? For the text file visit here
sample from text file:
That's very important,' the King said, turning to the jury. They were just beginning to write this down on their slates, when the White Rabbit interrupted: UNimportant, your Majesty means, of course,' he said in a very respectful tone, but frowning and making faces at him as he spoke. " UNimportant, of course, I meant,' the King hastily said, and went on to himself in an undertone, important--unimportant-- unimportant--important--' as if he were trying which word sounded best."
code:
#String input
with open("alice_in_wonderland.txt", "r") as myfile:
string=myfile.read().replace('\n','')
#initialize list
my_list = []
#Split words into list
for word in string.split(' '):
my_list.append(word)
#initialize list
uniqueWords = []
#Fill in new list with unique words to shorten final printout
for i in my_list:
if not i in uniqueWords:
uniqueWords.append(i)
#Legnth of longest word
count = 0
#Longest word place holder
longest = []
for word in uniqueWords:
if len(word)>count:
longest = word
count = len(longest)
print longest
Upvotes: 0
Views: 13351
Reputation: 414139
>>> import nltk # pip install nltk
>>> nltk.download('gutenberg')
>>> words = nltk.corpus.gutenberg.words('carroll-alice.txt')
>>> max(words, key=len) # find the longest word
'disappointment'
Upvotes: 3
Reputation: 142136
Here's one way using re
and mmap
:
import re
import mmap
with open('your alice in wonderland file') as fin:
mf = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
words = re.finditer('\w+', mf)
print max((word.group() for word in words), key=len)
# disappointment
Far more efficient than loading the file to physical memory.
Upvotes: 2
Reputation: 632
Use str.replace
to replace the dashes with spaces (or whatever you want). To do this, simply add another call to replace after the first call on line 3:
string=myfile.read().replace('\n','').replace('-', ' ')
Upvotes: 0