Reputation: 445
So my dataframe is made from lots of individual excel files, each with the the date as their file name and the prices of the fruits on that day in the spreadsheet, so the spreadsheets look something like this:
15012016:
Fruit Price
Orange 1
Apple 2
Pear 3
16012016:
Fruit Price
Orange 4
Apple 5
Pear 6
17012016:
Fruit Price
Orange 7
Apple 8
Pear 9
So to put all that information together I run the following code to put all the information into a dictionary of dataframes (all fruit price files stored in 'C:\Fruit_Prices_by_Day'
#find all the file names
file_list = []
for x in os.listdir('C:\Fruit_Prices_by_Day'):
file_list.append(x)
file_list= list(set(file_list))
d = {}
for date in Raw_list:
df1 = pd.read_excel(os.path.join('C:\Fruit_Prices_by_Day', date +'.xlsx'), index_col = 'Fruit')
d[date] = df1
Then this is the part where I'm stuck. How do I then make this dict into a dataframe where the column names are the dict keys i.e. the dates, so I can get the price of each fruit per day all in the same dataframe like:
15012016 16012016 17012016
Orange 1 4 7
Apple 2 5 8
Pear 3 6 9
Upvotes: 33
Views: 36964
Reputation: 2554
Solution:
pd.concat(d, axis=1).sum(axis=1, level=0)
Explanation:
After .concat(d, axis=1)
you will get
15012016 16012016 17012016
Price Price Price
Fruit
Orange 1 4 7
Apple 2 5 8
Pear 3 6 9
And adding .sum(axis=1, level=0)
transforms it to
15012016 16012016 17012016
Fruit
Orange 1 4 7
Apple 2 5 8
Pear 3 6 9
Upvotes: 8
Reputation: 863166
You can try first set_index
of all dataframes in comprehension
and then use concat
with remove last level of multiindex
in columns:
print d
{'17012016': Fruit Price
0 Orange 7
1 Apple 8
2 Pear 9, '16012016': Fruit Price
0 Orange 4
1 Apple 5
2 Pear 6, '15012016': Fruit Price
0 Orange 1
1 Apple 2
2 Pear 3}
d = { k: v.set_index('Fruit') for k, v in d.items()}
df = pd.concat(d, axis=1)
df.columns = df.columns.droplevel(-1)
print df
15012016 16012016 17012016
Fruit
Orange 1 4 7
Apple 2 5 8
Pear 3 6 9
Upvotes: 25
Reputation: 1195
Something like this could work: loop over the dictionary, add the constant column with the dictionary key, concatenate and then set the date as index
pd.concat(
(i_value_df.assign(date=i_key) for i_key, i_value_df in d.items())
).set_index('date')
Upvotes: 6