Reputation: 11
I have a file full of sentences and I want to do a unigram with that:
This is my code and is only getting one letter and I want it to get the word
old_lines = open("f.final",'r').readlines()
new_lines = []
for line in old_lines:
words = line.split()
new_lines.append(words)
print new_lines
for lines in new_lines:
c = Counter(str(lines))
with open("final.final", 'w') as f:
for k,v in c.items():
f.write("{} {}\n".format(k,v))
Upvotes: 0
Views: 27
Reputation: 78556
You're building the counter from a string (i.e. str(lines)
), which takes the count for each character in the string. You should build the counter directly from the list. And this should be done for all the lines, not just the last line:
with open("f.final") as f, open("final.final", 'w') as out_f:
# take count of all words from all lines
c = Counter(word for line in f for word in line.strip().split())
# write to output file
for k, v in c.items():
out_f.write("{} {}\n".format(k,v))
Upvotes: 1