Reputation: 47
I am new to Python so I'm doing some challenges and one of them is to find the number of unique words in a text file. The text file has 212 unique words in it but with the code I have it only shows 0. thank you for your help.
words=[]
count=0
with open ("text.txt","r") as file:
for line in file:
if line in words:
words.append(line)
k+=1
else:
pass
print(k)
Upvotes: 1
Views: 12668
Reputation: 347
count = 0
file = open("names.txt", "r")
read_data = file.read()
words = set(read_data.split())
for word in words:
count += 1
print('Total Unique Words:', count)
replace names.txt with your file name
Upvotes: 0
Reputation: 12571
There's quite a bit wrong in your example snippet:
dict
or set
in this case than a list
else
condition is unnecessaryHere's a simple implementation that fixes these issues, and uses a few neat language features:
with open("test.txt", "r") as file:
lines = file.read().splitlines()
uniques = set()
for line in lines:
uniques |= set(line.split())
print(f"Unique words: {len(uniques)}")
This example uses sets and f-strings, the latter of which is only available in Python 3.6+. Note, however, that we're "slurping" the entire file contents into a variable, which could be bad if the file is very large. I'm assuming that your example file is small.
Also, this example doesn't handle cases like punctuation and the like. So, "test" will be counted as a different word than "test." (with a period). Fixing that is left as an exercise to the reader.
Upvotes: 0
Reputation: 356
There seems to be an error in the code snippet, since k
is not declared. I am assuming you were trying to count
the number of unique words instead.
Also, there are better ways to find unique values in a list by converting it into a set. Values in a set will not contain duplicated values.
Check out the code snippet below.
words = []
count = 0
with open ("text.txt","r") as f:
# Get a list of lines in the file and covert it into a set
words = set(f.readlines())
count = len(words)
print(count)
Upvotes: 3
Reputation: 54
Change to if line not in file, you want to add the word if it is not in your list yet, and ignore it if it is already there
Upvotes: 0