maynull
maynull

Reputation: 2046

How can I set index while converting dictionary to dataframe?

I have a dictionary that looks like the below

defaultdict(list,
        {'Open': ['47.47', '47.46', '47.38', ...],
         'Close': ['47.48', '47.45', '47.40', ...],
         'Date': ['2016/11/22 07:00:00', '2016/11/22 06:59:00','2016/11/22 06:58:00', ...]})

My purpose is to convert this dictionary to a dataframe and to set the 'Date' key values as the index of the dataframe.

I can do this job by the below commands

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])
df.index = df.Date

Output:

               Date                  Date    Open   Close
2016/11/22 07:00:00   2016/11/22 07:00:00   47.47   47.48
2016/11/22 06:59:00   2016/11/22 06:59:00   47.46   47.45
2016/11/22 06:58:00   2016/11/22 06:58:00   47.38   47.38

but, then I have two 'Date' columns, one of which is the index and the other is the original column.

Is there any way to set index while converting dictionary to dataframe, without having overlapping columns like the below?

               Date   Close    Open
2016/11/22 07:00:00   47.48   47.47
2016/11/22 06:59:00   47.45   47.46
2016/11/22 06:58:00   47.38   47.38

Upvotes: 39

Views: 50565

Answers (2)

cottontail
cottontail

Reputation: 23071

If the original dictionary is not needed, then an alternative is to simply pop the Date key.

df = pd.DataFrame(mydict, index=pd.Series(mydict.pop('Date'), name='Date'))

That said, I think set_index is the more convenient and less verbose option that can be called immediately on the newly created frame:

df = pd.DataFrame(mydict).set_index('Date')

res

Upvotes: 2

jezrael
jezrael

Reputation: 862511

Use set_index:

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])  
df = df.set_index('Date')       
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40

Or use inplace:

df = pd.DataFrame(dictionary, columns=['Date', 'Open', 'Close'])  
df.set_index('Date', inplace=True)       
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40

Another possible solution filter out dict by Date key and then set index by dictionary['Date']:

df = pd.DataFrame({k: v for k, v in dictionary.items() if not k == 'Date'}, 
                   index=dictionary['Date'], 
                   columns=['Open','Close'])  
df.index.name = 'Date'
print (df)
                      Open  Close
Date                             
2016/11/22 07:00:00  47.47  47.48
2016/11/22 06:59:00  47.46  47.45
2016/11/22 06:58:00  47.38  47.40

Upvotes: 46

Related Questions