Reputation: 479
I have a data file that looks like this:
TOPIC:topic_0 2056
ab 2.0
cd 5.0
ef 3.0
gh 10.0
TOPIC:topic_1 1000
aa 3.0
bd 5.0
gh 2.0
and so on......till TOPIC:topic_2000. The first line is the topic and it's weight. That is, I have the words in that specific topic and their respective weights.
Now, I want to sum up the second column of each topic and check what value it gives. That is, I want to get the output as:
Topic:topic_0 20
Topic:topic_1 10
That is, the topic number and the sum of column value (that is in topic 1, the weights of the words are 2,5,3,10). I tried using:
with open('Input.txt') as in_file:
for line in in_file:
columns = line.split(' ')
value = columns[0]
if value[:6] == 'TOPIC:':
total_value = columns[1]
total_value = total_value[:-1]
total_values = float(total_value)
#print '\n'
print columns[0]
But, I am not sure how to proceed from this. This is just printing the topic numbers. Please help!
Upvotes: 1
Views: 504
Reputation: 37069
Try this: Works with both Python 2.7 and 3.5
import re;
total = 0
temp = ''
topic = {}
p = re.compile('[a-z]*')
with open('Input.txt') as in_file:
for line in in_file:
line = line.strip()
if not line: continue
if line.startswith('TOPIC:'):
temp = (line.split(' ')[0]).replace('TOPIC:', '')
topic[temp] = 0;
else:
value = p.sub('', line).strip()
topic[temp] = float(topic[temp]) + float(value)
for key in topic:
print ("Topic:%s %s" % (key, topic[key]))
Result:
$ /c/Python27/python.exe input.py
Topic:topic_1 10.0
Topic:topic_0 20.0
Upvotes: 1
Reputation: 9044
import re
input = """
TOPIC:topic_0 2056
ab 2.0
cd 5.0
ef 3.0
gh 10.0
TOPIC:topic_1 1000
aa 3.0
bd 5.0
gh 2.0
"""
result = {}
for line in input.splitlines():
line = line.strip()
if not line:
continue
columns = re.split(r"\s+", line)
value = columns[0]
if value[:6] == 'TOPIC:':
result[value] = []
points = result[value]
continue
points.append(float(columns[1]))
for k, v in result.items():
print k, sum(v)
Upvotes: 1