Using a matrix format in Python to calculate my own similarity score

Question

I have a csv file and that is the values of commodities traded between countries, something like this:

Country  Comm  Value
 GER       1     200
 GER       2     300
 GER       45    354
 USA       2     100
 USA       85    500
 UK        2     240
 UK        85    900

I have created a matrix with this data. In this created matrix, rows are countries and columns are commodities' codes and each element shows the value of trade. The number of commodities is 97 and I've used the following code to create that matrix:

rfile = open('file path','r')
rfile.next()
dic_c1_products = {}
for i in rfile :
    lns = i.strip().split(',')
    c1 = lns[0]
    p = lns[1]
    value= lns[2]
    if not dic_c1_products.has_key(c1):
        dic_c1_products[c1] = [(p,value),]
    else:
        dic_c1_products[c1].append((p,value))
 product_count  = 97
 c1_list = dic_c1_products.keys()
 matrix_c1_products = [[0 for col in range(int(product_count)+1)] for row     in range(len(c1_list))]
 for c1 in dic_c1_products:
      for p, v in dic_c1_products[c1]:
           matrix_c1_products[c1_list.index(c1)][int(p)] = int(v)
 print 'Matirix Done'

Now I want to calculate an index score for each pair of countries (the pair score is: total trade in common over total trade of each country). The created matrix has a form like this:

Countries   Commodity1 Commodity2 Commodity45 Commodity85
 GER           200        300         45          0
 USA            0         100          0         500
 UK             0         240          0         900

First I want to sum the total values of the SAME commodities that two countries are trading and then divide this amount to TOTAL trade of those two countries. For example if we consider GER-USA, they both trade commodities number 2, so I want to have summation of these common commodities (300+100) over the summation of total trade of Germany and the United States : (Fist Row:200+300+354)+(Second Row: 100+500) In simple words, if we consider the matrix: First, I want to calculate the total values for GER and USA rows. Second, to calculate the values of the total common commodities which are being traded Third, divide the value of stage two to the value of stage one. For doing this, I have written the following code:

for i in range(len(matrix_c1_products)):
    for j in range(i, len(matrix_c1_products)):
            dividend=sum([matrix_c1_products[i]])+sum([matrix_c1_products[j]])
        for k in matrix_c1_products[i]:
            for l in matrix_c1_products[j]:
              #  print k,l
                if int(k)==int(0):
                    pass
                if int(l)==int(0):
                    pass
                else:
                    commonone.append(k)
                    commontwo.append(l)
             divisor=sum(commonone)+sum(commontwo)
             shares=int(divisor/dividend)
             print shares, divisor, dividend

but there is a problem with commonone list. I intend to remove zeros from two rows and add the existence values but because of the loop, the same number repeats in the list and the results are not correct. Any help would be appreciated.

Using a matrix format in Python to calculate my own similarity score

Answers (1)

Related Questions