owwoow14
owwoow14

Reputation: 1754

summing vectors in Python

Hi I am trying to add the 3rd column in the following example of input:

input1:

act hi  1
act bye 2
act ciao    5

input2:

art hi  1
art bye 2
art kiss    5

with the following desired output:

act-art hi  2
act-art bye 4
act-art kiss    5
act-art ciao    5

Below is the code that I have been working with.

def sumVectors(classB_infile, classA_infile, outfile):

    class_dictA = {}

    with open(classA_infile, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            classA, feat, valuesA = items[:3]
            class_dictA[feat] = float(valuesA)


    class_dictB = {}

    with open(classB_infile, "rb") as opened_infile_B:
        for line in opened_infile_B:
            items = line.split()
            classB, feat, valuesB = items[:3]
            class_dictB[feat] = float(valuesB)


#print classA, classB, feat, sumVectors

####outfile 
    with open(outfile, "wb") as output_file:
        for key in class_dictA:
            if key in class_dictB:
                weight = class_dictA[key] + class_dictB[key]
                #outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            else:
                weight = class_dictA[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
                output_file.write(outstring + "\n")

        for key in class_dictB:
            if key in class_dictA:
                weight = class_dictB[key]
            outstring = "\t".join([classA + "-" + classB, key, str(weight)])
            output_file.write(outstring + "\n")

However, it gives me the following output:

act-art stress  5.0
act-art bye 2.0
act-art hi  1.0
act-art kiss    1.0

Any insight as to why it is not summing the values in common in the 2nd column? Thank you

Upvotes: 0

Views: 931

Answers (2)

Frerich Raabe
Frerich Raabe

Reputation: 94549

Instead of writing a two loops to implement the "merging" of the two dictionaries, I recommend to use a defaultdict:

result = collections.defaultdict(float, class_dictA)
for k, v in class_dictB.items(): result[k] += v

What this does is to create a new result dictionary which is a copy of class_dictA. Then, you add all the values in class_dictB to the result dictionary. If a key doesn't exist yet, it's treated the same as if it had the value (which is what calling float() does).

Upvotes: 3

BartoszKP
BartoszKP

Reputation: 35921

This contains the simplest fixes to achieve the desired result:

def sumVectors(classB_infile, classA_infile, outfile):
    class_dictA = {}

    with open(classA_infile, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            classA, feat, valuesA = items[:3]
            class_dictA[feat.strip()] = float(valuesA)


    class_dictB = {}

    with open(classB_infile, "rb") as opened_infile_B:
        for line in opened_infile_B:
            items = line.split()
            classB, feat, valuesB = items[:3]
            class_dictB[feat.strip()] = float(valuesB)

    ####outfile 
    with open(outfile, "wb") as output_file:
        for key in class_dictA:
            if key in class_dictB:
                weight = class_dictA[key] + class_dictB[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            else:
                weight = class_dictA[key]
                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            output_file.write(outstring + "\n")

        for key in class_dictB:
            if key not in class_dictA: # if key was in A it was processed already
                weight = class_dictB[key]
                outstring = "\t".join([classA + "-" + classB, key, str(weight)])
                output_file.write(outstring + "\n")

However this can really be simplified:

def readFile(fileName, keys):
    result = {}
    class_ = ''
    with open(fileName, "rb") as opened_infile_A:
        for line in opened_infile_A:
            items = line.split()
            class_, feat, value = items[:3]
            keys.add(feat)
            result[feat] = float(value)
    return (class_, result)


def sumVectors(classB_infile, classA_infile, outfile):
    keys = set()

    classA, class_dictA = readFile(classA_infile, keys)
    classB, class_dictB = readFile(classB_infile, keys)

    with open(outfile, "wb") as output_file:
        for key in keys:
            weightA = class_dictA[key] if key in class_dictA else 0
            weightB = class_dictB[key] if key in class_dictB else 0
            weight = weightA + weightB
            outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
            output_file.write(outstring + "\n")

Upvotes: 4

Related Questions