Reputation: 29
article = open("article.txt", encoding="utf-8")
for i in article:
print(max(i.split(), key=len))
The text is written with line breaks, and it gives me the longest words from each line. How to get the longest word from all of the text?
Upvotes: 0
Views: 130
Reputation: 11
If your file is large enough to fit in memory, you can read all line at once.
file = open("article.txt", encoding="utf-8", mode='r')
all_text = file.read()
longest = max(i.split(), key=len)
print(longest)
Upvotes: 0
Reputation: 548
There are many ways by which you could do that. This would work
with open("article.txt", encoding="utf-8") as article:
txt = [word for item in article.readlines() for word in item.split(" ")]
biggest_word = sorted(txt, key=lambda word: (-len(word), word), )[0]
Note that I am using a with statement to close the connection to the file when the reading is done, that I use readlines to read the entire file, returing a list of lines, and that I unpack the split items twice to get a flat list of items. The last line of code sorts the list and uses -len(word)
to inverse the sorting from ascending to descending.
I hope this is what you are looking for :)
Upvotes: 0
Reputation: 4498
Instead of iterating through each line, you can get the entire text of the file and then split them using article.readline().split()
article = open("test.txt", encoding="utf-8")
print(max(article.readline().split(), key=len))
article.close()
Upvotes: 0
Reputation: 521103
One approach would be to read the entire text file into a Python string, remove newlines, and then find the largest word:
with open('article.text', 'r') as file:
data = re.sub(r'\r?\n', '', file.read())
longest_word = max(re.findall(r'\w+', data), key=len)
Upvotes: 1
Reputation: 89
longest = 0
curr_word = ""
with open("article.txt", encoding="utf-8") as f:
for line in f:
for word in line.split(" "): # Use line-by-line generator to avoid loading large file in memory
word = word.strip()
if (wl := len(word)) > longest: # Python 3.9+, otherwise use 2 lines
longest = wl
curr_word = word
print(curr_word)
Upvotes: 0