Ana_Sam
Ana_Sam

Reputation: 479

Summing columns in a text file

I have a data file that looks like this:

 TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0

and so on......till TOPIC:topic_2000. The first line is the topic and it's weight. That is, I have the words in that specific topic and their respective weights.

Now, I want to sum up the second column of each topic and check what value it gives. That is, I want to get the output as:

 Topic:topic_0  20
 Topic:topic_1  10

That is, the topic number and the sum of column value (that is in topic 1, the weights of the words are 2,5,3,10). I tried using:

with open('Input.txt') as in_file:
    for line in in_file:
        columns = line.split(' ')
        value = columns[0]

        if value[:6] == 'TOPIC:':
            total_value = columns[1]
            total_value = total_value[:-1]
            total_values = float(total_value)
            #print '\n'
            print columns[0]

But, I am not sure how to proceed from this. This is just printing the topic numbers. Please help!

Upvotes: 1

Views: 504

Answers (2)

zedfoxus
zedfoxus

Reputation: 37069

Try this: Works with both Python 2.7 and 3.5

import re;

total = 0
temp = ''
topic = {}
p = re.compile('[a-z]*')

with open('Input.txt') as in_file:
    for line in in_file:
        line = line.strip()
        if not line: continue

        if line.startswith('TOPIC:'):
            temp = (line.split(' ')[0]).replace('TOPIC:', '')
            topic[temp] = 0;
        else:
            value = p.sub('', line).strip()
            topic[temp] = float(topic[temp]) + float(value)

for key in topic:
    print ("Topic:%s %s" % (key, topic[key]))

Result:

$ /c/Python27/python.exe input.py
Topic:topic_1 10.0
Topic:topic_0 20.0

Upvotes: 1

Dyno Fu
Dyno Fu

Reputation: 9044

import re

input = """
TOPIC:topic_0 2056
 ab  2.0
 cd  5.0
 ef  3.0
 gh  10.0

 TOPIC:topic_1 1000
 aa  3.0
 bd  5.0
 gh  2.0
"""

result = {}
for line in input.splitlines():
    line = line.strip()
    if not line:
        continue

    columns = re.split(r"\s+", line)
    value = columns[0]
    if value[:6] == 'TOPIC:':
        result[value] = []
        points = result[value]
        continue

    points.append(float(columns[1]))

for k, v in result.items():
    print k, sum(v)

Upvotes: 1

Related Questions