Yousra Gad
Yousra Gad

Reputation: 363

compare a dictionary with excel file

I have a file containing words and dictionary containing each word as key with the score as value:

data.xlsx

1. one  neg_three
2. one  two

dictionary :

num_dic = {'one':'1','two':'2','neg_three':'-3',....}

What I want is compare each key with each row in the excel file and sum the value of all words using the values in the dictionary and save the result in the last column in the excel file. here is my code

import xlrd

loc = ("data.xlsx")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0, 0)
list1 = []
for i in range(1, sheet.nrows): # start at 1 skips header row
    list1.append(sheet.cell_value(i, 0))

# save the data to a list and then converted to a string for spliting 
str1 = ''.join(list1)
list2 = str1.spilt()

# sum the values
total = 0
for i in list2:
 for key, value in num_dic.items():
   if key in i:
        total =+ int(value)

# save the result in excel file
 expData = pd.DataFrame(total, columns = ['total'])
 writer = ExcelWriter('data.xlsx')
 expData.to_excel(writer,'Sheet1',index=False,startcol=1)
 writer.save()

the output should be like this:

data.xlsx:
     data              total
1. one  neg_three       -2
2. one  two              3

can anyone advise me in my code? because it doesn't work. Thanks in advance

Upvotes: 0

Views: 687

Answers (2)

Surani Matharaarachchi
Surani Matharaarachchi

Reputation: 589

Hope this what you are searching for,

import xlrd
import pandas as pd
num_dic = {'one':'1','two':'2','neg_three':'-3'}

loc = ("data.xlsx")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0, 0)
list1 = []
total = []
for i in range(1, sheet.nrows): # start at 1 skips header row
    result = sheet.cell_value(i, 0).split(" ")
    num1 = 0
    for j in result:
        if j:
            num1 += int(num_dic[j])
    total.append(num1)

sheet1 = pd.read_excel('data.xlsx')
sheet1['total']=total
writer = pd.ExcelWriter('data.xlsx')
sheet1.to_excel(writer,'Sheet1',index=False,startcol=0)
writer.save()

enter image description here

Upvotes: 1

Edwin van Mierlo
Edwin van Mierlo

Reputation: 2488

I don't have xlrd installed, so the following is in theory only, I did not test it.

Lets look at your code block by block:

import xlrd

loc = ("data.xlsx")
wb = xlrd.open_workbook(loc)
sheet = wb.sheet_by_index(0)
sheet.cell_value(0, 0)
list1 = []
for i in range(1, sheet.nrows): # start at 1 skips header row
    list1.append(sheet.cell_value(i, 0))

This should do roughly what you want, and appends the read input data to list1. Except it will not read the last row of your data: you may have to consider to change your range() to range(1, sheet.nrows + 1)

Next you .join() it all together, and then do a .split() on it:

# save the data to a list and then converted to a string for spliting 
str1 = ''.join(list1)
list2 = str1.spilt()

I would not do this at all, just leave list1 as is. You can split the values while computing.

Then you use two for loops to do some sort of calculation, and store them all into a variable called total:

# sum the values
total = 0
for i in list2:
 for key, value in num_dic.items():
   if key in i:
        total =+ int(value)

This is not really doing what you want, you want to store a total for each row of input data, which then can be written to your result file. Something like this:

# use actual integers in your lookup table
num_dic = {'one':1,'two':2,'neg_three':-3} 

list2 = []  ## this is your list containing the totals for each row
for in_row in list1:
    row, string_value1, string_value2 = in_row.split()
    if string_value1 and string_value2:
        # values are not empty continue with calculation
        # use dict.get() to ensure a default value of 0 if not found
        total = num_dic.get(string_value1, 0) + num_dic.get(string_value2, 0)
        # append the total to the list
        list2.append(str(total))
    else:
        # one or both of the values is empty
        # appending an empty string to the result list2
        list2.append('')
# at this point list2 contains all your results for each row of input

Your last block is writing the results to your file:

# save the result in excel file
 expData = pd.DataFrame(total, columns = ['total'])
 writer = ExcelWriter('data.xlsx')
 expData.to_excel(writer,'Sheet1',index=False,startcol=1)
 writer.save()

As I said I do not have xlrd, so I cannot comment on writing the file, so you now need to write the list list2 to your "total" column in your XL file. How you do that may need some experimentation.

Upvotes: 0

Related Questions