Reputation: 153
def main():
sent_file = open(sys.argv[1])
tweet_file=open(sys.argv[2])
scores={}
for line in sent_file:
term, score=line.split("/t")
scores[term]=int(score)
the sent_file is something like this:
abandon -2
abandoned -2
each separated by \t
, could anybody help me figure out this problem?
Upvotes: 0
Views: 4590
Reputation: 1121486
You want to skip empty lines or lines without a \t
, just catch the ValueError
exception in those cases:
for line in sent_file:
try:
term, score = line.split("\t")
scores[term] = int(score)
except ValueError:
pass
However, from the comments it appears you have data that is space-separated as well ('abilities 2\n'
has no \t
character in the line), so perhaps you should split on general whitespace instead:
for line in sent_file:
try:
term, score = line.rsplit(None, 1) # split on last whitespace separator
scores[term] = int(score)
except ValueError:
pass
Now you are splitting on the last arbitrary-width separator on the line (not counting whitespace at the start and end), and only splitting once. If any of your terms contain whitespace too, this ensures that they are preserved. This assumes your score values do not have any whitespace in them (which would also break with your own code).
If you are certain that all you have is \t
separated data, or you can clean up your input files to use only tabs, an alternative could be to use the csv
module instead, and to use a dict comprehension:
import csv
with open(sys.argv[1], 'rb') as sent_file:
reader = csv.reader(sent_file, delimiter='\t')
scores = {term: int(score) for term, score in reader}
Upvotes: 1