Anil_M
Anil_M

Reputation: 11453

pandas :Read xlsx file to dict with column1 as key and column2 as values

I am new to pandas. I need to read a xlsx file and convert first column to key of a dict and second column to values of a dict using pandas. I also need to skip / exclude first row which are headers.

The answer here is for pymysql and here is for csv. I need to user pandas.

Here is a sample excel data

dict_key    dict_value  
key1        str_value1  
key2        str_value2  
key3         None  
key4         int_value3  

My code so far is as below.

import pandas as pd

excel_file = "file.xlsx"
xls = pd.ExcelFile(excel_file)
df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None'])
data_dict = df.to_dict()

However, it gives me dict where keys are column numbers and values are both column1 data as well as column2 data.

>>> data_dict
{u'Chg_Parms': {0: u'  key1 ', 1: u'   key2 ', 2: u'   key3 ', 3: u'   key4 ', 4: u'   str_value1 ', 
                5: u'   str_value2 ', 6: u'   Nan ', 6: u'   int_value3 '}}

what I would like to have is column1 data as key and column two data as values and also NaN replaced with None

data_dict = {'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3}

Thanks for your help.

Upvotes: 3

Views: 14976

Answers (2)

mechanical_meat
mechanical_meat

Reputation: 169314

You can use a collections.OrderedDict to keep the keys in order. You'll note that pd.read_excel loads the first sheet by default. Edit: then you say you want to encode the items in the dictionary, and evaluate 'None' as None...

import collections as co
import pandas as pd

df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8')) 
                    for (k,v) in df.values)

Result:

>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])

General note: you should keep strings as Unicode within your Python program.

Upvotes: 1

johannesmik
johannesmik

Reputation: 751

You can use pandas read_excel method to read the excel file more conveniently. You can pass a index_col argument where you can define which column of your xlsx is the index.

How to change NaN to None is explained in this question.

Given an xlsx file called example.xlsx which is build like you wrote above, the following code should give your expected results:

import pandas as pd

df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)

print df.to_dict()["dict_value"]

Upvotes: 3

Related Questions