Reputation: 414
Consider the following dataframe
my_df = pd.DataFrame()
my_df.at[0,'tunnel1']=3
my_df.at[1,'tunnel1']=3
my_df.at[1,'tunnel2']=2
my_df.at[2,'tunnel1']=3
my_df.at[2,'tunnel2']=2
my_df.at[3,'tunnel1']=4
my_df.at[3,'tunnel2']=1
my_df.at[3,'tunnel3']=4
my_df.at[4,'tunnel1']=1
my_df.at[4,'tunnel2']=5
my_df.at[4,'tunnel3']=1
my_df.at[5,'tunnel1']=1
my_df.at[5,'tunnel2']=5
my_df.at[5,'tunnel3']=1
my_df.at[5,'tunnel4']=3
my_df.at[6,'tunnel1']=6
my_df.at[6,'tunnel2']=5
my_df.at[6,'tunnel3']=5
my_df.at[6,'tunnel4']=2
my_df['data1']='ham'
my_df['data2']='eggs'
my_df['data3']='coffee'
df looks like
tunnel1 tunnel2 tunnel3 tunnel4 data1 data2 data3
0 3.0 NaN NaN NaN ham eggs coffee
1 3.0 2.0 NaN NaN ham eggs coffee
2 3.0 2.0 NaN NaN ham eggs coffee
3 4.0 1.0 4.0 NaN ham eggs coffee
4 1.0 5.0 1.0 NaN ham eggs coffee
5 1.0 5.0 1.0 3.0 ham eggs coffee
6 6.0 5.0 5.0 2.0 ham eggs coffee
Then set a multiindex
my_df = my_df.set_index(['tunnel1', 'tunnel2', 'tunnel3', 'tunnel4'])
Looks like
data1 data2 data3
tunnel1 tunnel2 tunnel3 tunnel4
3.0 NaN NaN NaN ham eggs coffee
2.0 NaN NaN ham eggs coffee
NaN ham eggs coffee
4.0 1.0 4.0 NaN ham eggs coffee
1.0 5.0 1.0 NaN ham eggs coffee
3.0 ham eggs coffee
6.0 5.0 5.0 2.0 ham eggs coffee
Now I want to slice it so that get rows for each unique entry of the multiindex
for configuration in my_df.index.unique():
mini_df=my_df.loc[configuration]
pandas.core.indexing.IndexingError: Too many indexers
First index slider is
configuration
(3.0, nan, nan, nan)
And this i believe is causing the error.
What I want from my loop is
mini_df
tunnel1 tunnel2 tunnel3 tunnel4 data1 data2 data3
0 3.0 NaN NaN NaN ham eggs coffee
mini_df'
tunnel1 tunnel2 tunnel3 tunnel4 data1 data2 data3
1 3.0 2.0 NaN NaN ham eggs coffee
2 3.0 2.0 NaN NaN ham eggs coffee
mini_df''
tunnel1 tunnel2 tunnel3 tunnel4 data1 data2 data3
3 4.0 1.0 4.0 NaN ham eggs coffee
mini_df'''
tunnel1 tunnel2 tunnel3 tunnel4 data1 data2 data3
4 1.0 5.0 1.0 NaN ham eggs coffee
Any suggestions on what to try here please? Thanks for your help in advance.
Upvotes: 0
Views: 243
Reputation: 30920
Use DataFrame.xs
+ Index.get_level_values
:
for id1 in my_df.index.get_level_values(0).unique():
print(my_df.xs(id1))
You couls save the dataframes in a dict:
df_id1={id1:my_df.xs(id1) for id1 in my_df.index.get_level_values(0).unique()}
for key in df_id1:
print(f'df_id1[{key}]')
print('-'*50)
print(df_id1[key])
df_id1[3.0]
--------------------------------------------------
data1 data2 data3
tunnel2 tunnel3 tunnel4
NaN NaN NaN ham eggs coffee
2.0 NaN NaN ham eggs coffee
NaN ham eggs coffee
df_id1[4.0]
--------------------------------------------------
data1 data2 data3
tunnel2 tunnel3 tunnel4
1.0 4.0 NaN ham eggs coffee
df_id1[1.0]
--------------------------------------------------
data1 data2 data3
tunnel2 tunnel3 tunnel4
5.0 1.0 NaN ham eggs coffee
3.0 ham eggs coffee
df_id1[6.0]
--------------------------------------------------
data1 data2 data3
tunnel2 tunnel3 tunnel4
5.0 5.0 2.0 ham eggs coffee
We can also use DataFrame.groupby
:
for i, group in my_df.groupby(level=0):
#for i, group in my_df.groupby('tunnel1'): #latest versions of pandas
print(group)
data1 data2 data3
tunnel1 tunnel2 tunnel3 tunnel4
1.0 5.0 1.0 NaN ham eggs coffee
3.0 ham eggs coffee
data1 data2 data3
tunnel1 tunnel2 tunnel3 tunnel4
3.0 NaN NaN NaN ham eggs coffee
2.0 NaN NaN ham eggs coffee
NaN ham eggs coffee
data1 data2 data3
tunnel1 tunnel2 tunnel3 tunnel4
4.0 1.0 4.0 NaN ham eggs coffee
data1 data2 data3
tunnel1 tunnel2 tunnel3 tunnel4
6.0 5.0 5.0 2.0 ham eggs coffee
Upvotes: 1
Reputation: 323306
Why not try replace
or fillna
the NaN
with string 'NaN'
my_df = my_df.fillna('NaN').set_index(['tunnel1', 'tunnel2', 'tunnel3', 'tunnel4'])
Upvotes: 2