Reputation: 11453
I am new to pandas. I need to read a xlsx
file and convert first column to key of a dict and second column to values of a dict using pandas
. I also need to skip / exclude first row which are headers.
The answer here is for pymysql
and here is for csv
. I need to user pandas
.
Here is a sample excel data
dict_key dict_value
key1 str_value1
key2 str_value2
key3 None
key4 int_value3
My code so far is as below.
import pandas as pd
excel_file = "file.xlsx"
xls = pd.ExcelFile(excel_file)
df = xls.parse(xls.sheet_names[0], skiprows=1, index_col=None, na_values=['None'])
data_dict = df.to_dict()
However, it gives me dict where keys are column numbers and values are both column1 data as well as column2 data.
>>> data_dict
{u'Chg_Parms': {0: u' key1 ', 1: u' key2 ', 2: u' key3 ', 3: u' key4 ', 4: u' str_value1 ',
5: u' str_value2 ', 6: u' Nan ', 6: u' int_value3 '}}
what I would like to have is column1 data as key and column two data as values and also NaN
replaced with None
data_dict = {'key1': 'str_value1', 'key2': 'str_value2', 'key3': None, 'key4': int_value3}
Thanks for your help.
Upvotes: 3
Views: 14976
Reputation: 169314
You can use a collections.OrderedDict
to keep the keys in order. You'll note that pd.read_excel
loads the first sheet by default. Edit: then you say you want to encode the items in the dictionary, and evaluate 'None'
as None
...
import collections as co
import pandas as pd
df = pd.read_excel('file.xlsx')
df = df.where(pd.notnull(df), None)
od = co.OrderedDict((k.strip().encode('utf8'),v.strip().encode('utf8'))
for (k,v) in df.values)
Result:
>>> od
OrderedDict([(u'key1', u'str_value1'), (u'key2', u'str_value2'), (u'key3', u'None'), (u'key4', u'int_value3')])
General note: you should keep strings as Unicode within your Python program.
Upvotes: 1
Reputation: 751
You can use pandas read_excel method to read the excel file more conveniently. You can pass a index_col argument where you can define which column of your xlsx is the index.
How to change NaN to None is explained in this question.
Given an xlsx file called example.xlsx which is build like you wrote above, the following code should give your expected results:
import pandas as pd
df = pd.read_excel("example.xlsx", index_col=0)
df = df.where(pd.notnull(df), None)
print df.to_dict()["dict_value"]
Upvotes: 3