crypatix
crypatix

Reputation: 55

Comparing 2 excel files via Python. Is there any other recommended language to use instead of python?

reference file:
enter image description here
fill_in:
basically, the you're taking the values in col 1 (left) and comparing them with the values in the reference file (col1). If the values are an exact match, it will take the value in col2 from reference and place it into col2 of the fill_in file. (below)
enter image description here

So far, my codes is this :

import win32com.client, csv, os, string
# Office 2010 - Microsoft Office Object 14.0 Object Library
from win32com.client import gencache
gencache.EnsureModule('{2DF8D04C-5BFA-101B-BDE5-00AA0044DE52}', 0, 2, 5)
                                                                            #
# Office 2010 - Excel COM
from win32com.client import gencache
gencache.EnsureModule('{00020813-0000-0000-C000-000000000046}', 0, 1, 7)
#
Application = win32com.client.Dispatch("Excel.Application")
Application.Visible = True
Workbook = Application.Workbooks.Add()
Sheet = Application.ActiveSheet
#

#REFERENCE FILE
f = open("reference_file.csv", "rb")
ref = csv.reader(f)
ref_dict = dict()

#FILE WITH BLANKS
g = open("fill_in.csv", "rb")
fill = csv.reader(g)
fill_dict = dict()

#CODE STARTS
gene_dic = dict()
count = 0
#Make reference file into a dictionary
for line in ref:
    ref_dict[line[1]] = [line[0]]
#Make Fill in file into a dictionary
for i in fill:
    fill_dict[i[1]] = [i[0]]
#finding difference in both dictionaries
diff = {}
for key in ref_dict.keys():
    if(not fill_dict.has_key(key)):
        diff[key] = (ref_dict[key])
    elif(ref_dict[key] != fill_dict[key]):
        diff[key] = (ref_dict[key], fill_dict[key])
for key in fill_dict.keys():
    if(not ref_dict.has_key(key)):
        diff[key] = (fill_dict[key])
fill_dict.update(diff)
print fill_dict
#Put dictionary into an Array
temp = []
dictlist = []
for key, value in fill_dict.iteritems():
    temp = [key, value]
    dictlist.append(temp)
    dictlist.sort()
print(dictlist)

for i in dictlist:
    count += 1
    Sheet.Range("A" + str(count)).Value = i[0]
    Sheet.Range("B" + str(count)).Value = i[1]

Workbook.SaveAs(os.getcwd() + "/" + "result1.csv")

The results is this:
enter image description here


But the supposed result was suppose to be like this:
enter image description here

If in column 2(column B), there is a value, it should remain untouched. If there's an empty cell, and it has a match in the reference file, it would print the number into columnB

I've also tried this code, however i've only manage to put it in a list, not in excel :

r=open("reference_file.csv","rb")
ref = csv.reader(r)
ref_dict = dict()

f=open("fill_in.csv", "rb")
fill = csv.reader(f)

#CODE STARTS
lst = []
lstkey = []
count = 0
#put reference file in a dictionary
for line in ref:
    ref_dict[line[1]] = [line[0]]


all_values = defaultdict(list)
for key in ref_dict:
    for value in (map(lambda x: x.strip(), ref_dict[key][0].split(","))):
        all_values[value].append(key)

for i in lst:
    lstkey.append(all_values[i])
print lstkey

Upvotes: 0

Views: 365

Answers (2)

bl618515
bl618515

Reputation: 107

I dont know if there is any specific language to use when operating with excel files, but for sure you can use ruby. I personally find ruby codes easier to understand and would use ruby for a task like this. You can check out this topic where they parse an excel file and do some checks. Hope it helps.

Upvotes: 1

Ed Doxtator
Ed Doxtator

Reputation: 74

Couple of thoughts:

  • If you're thinking about using non-Python solutions, Have you tried VBA (Visual Basic For Applications)?

  • If you stick with Python, take a look at John Machin's outstanding xlrd (Excel Read) and xlwt (Excel Write) tools. You can find them on http://www.python-excel.org. With a little playing around, you should be able to apply the results of the list you generated to a new spreadsheet or workbook using xlwt.

Upvotes: 0

Related Questions