Reputation: 41
I am new to Python and I'm trying to understand how to manipulate data with pandas DataFrames. I searched for similar questions but I don't see any satisfying my exact need. Please point me to the correct post if this is a duplicate.
So I have multiple DataFrames with the exact same shape, columns and index. How do I combine them with labels so I can easily access the data with any column/index/label?
E.g. after the setup below, how do I put df1 and df2 into one DataFrame and label them with the names 'df1' and 'df2', so I can access data in a way like df['A']['df1']['b'], and get number of rows of df?
>>> import numpy as np
>>> import pandas as pd
>>> df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'], index=['a', 'b'])
>>> df2 = pd.DataFrame([[5, 6], [7, 8]], columns=['A', 'B'], index=['a', 'b'])
>>> df1
A B
a 1 2
b 3 4
>>> df2
A B
a 5 6
b 7 8
Upvotes: 2
Views: 14011
Reputation: 1201
Using xarray is recommended, as other answers to similar questions have suggested. Since pandas Panels were deprecated in favour of xarray.
Upvotes: 0
Reputation: 862801
I think MultiIndex DataFrame
is answer created by concat
:
df = pd.concat([df1, df2], keys=('df1','df2'))
print (df)
A B
df1 a 1 2
b 3 4
df2 a 5 6
b 7 8
Then for basic select is possible use xs
:
print (df.xs('df1'))
A B
a 1 2
b 3 4
And for select index and columns together use slicers:
idx = pd.IndexSlice
print (df.loc[idx['df1', 'b'], 'A'])
3
Another possible solution is use panels.
But in newer versions of pandas is deprecated.
Upvotes: 10