Reputation: 414
I'm just starting work with multiframes and I'm having a little trouble with the fairly sparse documentation and online examples on slicing and indexing.
Consider the following mutiframe
import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}
index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.748376
pears 0.639824
nanas 0.604342
mid apples 0.160837
pears 0.970412
nanas 0.301815
wed low apples 0.572627
pears 0.254242
nanas 0.590702
mid apples 0.153772
pears 0.180117
nanas 0.858085
fri low apples 0.535358
pears 0.576359
nanas 0.893993
mid apples 0.334602
pears 0.053892
nanas 0.778767
Vendor B mon low apples 0.565761
pears 0.437994
nanas 0.090994
mid apples 0.261041
pears 0.028795
nanas 0.057612
wed low apples 0.808108
pears 0.914724
nanas 0.020663
mid apples 0.055319
pears 0.888612
nanas 0.623370
fri low apples 0.419422
pears 0.938593
nanas 0.358441
mid apples 0.534191
pears 0.590103
nanas 0.753034
What's the most pythonic way to achieve the following
1) View all the wed data as a slice
1a) stretch goal: don't care that 'day' is index.names[1], instead index by index name 'day'
2) Write an iterable of data only to that wed slice
3) add a chiller_temp of high for all vendors and days and fruits
I saw some slicing happening using idx = pd.IndexSlice.
idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100))
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)
This raises an error AttributeError: 'list' object has no attribute 'loc'
df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)
raises TypeError: unhashable type: 'slice'
Upvotes: 2
Views: 2307
Reputation: 5155
1) View all the wed data as a slice
For viewing data in a multiindex, it's a lot easier to use .xs (cross section), which allows you to specify values for specific index levels instead of making you type out all of the levels like .loc w/ slice will make you do:
df.xs('wed', level='day')
Out:
deliveries
produce_source chiller_temp fruit
Vendor A low apples 0.521861
pears 0.741856
nanas 0.245843
mid apples 0.471135
pears 0.191322
nanas 0.153920
Vendor B low apples 0.711457
pears 0.211794
nanas 0.599071
mid apples 0.303910
pears 0.657348
nanas 0.111750
2) Write an iterable of data only to that wed slice
If I understand this correctly, you're trying to replace the values in the 'deliveries' column with a specific iterable (say a list) where the day is 'wed'. Unfortunately .loc-type replacing doesn't work in this instance. As far as I know, pandas only has easy syntax for replacing the value of a single cell in this way using .at or .loc (see this SO answer). However, we can use iterrows to accomplish this:
idx = pd.IndexSlice
# If we don't change the column's type, which was float, this will error
df['deliveries'] = df['deliveries'].astype(object)
# Loop through rows, replacing single values
# Only necessary if the new assigned value is mutable
for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
df.at[index, 'deliveries'] = ["We", "changed", "this"]
df.head(10)
Out:
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.0287606
pears 0.264512
nanas 0.238089
mid apples 0.814985
pears 0.590967
nanas 0.919351
wed low apples [We, changed, this]
pears [We, changed, this]
nanas [We, changed, this]
mid apples [We, changed, this]
While the looping is required to my knowledge, using df.xs and then df.update instead of .loc is more understandable in my option. For example, the following code does the same as the .loc code above:
df['deliveries'] = df['deliveries'].astype(object)
# Create a temporary copy of our cross section
df2 = df.xs('wed', level='day', drop_level=False)
# The same loop as before
for index, row in df2.iterrows():
df2.at[index, 'deliveries'] = ["We", "changed", "this"]
# Update the original df for the values we want from df2
df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)
3) add a chiller_temp of high for all vendors and days and fruits
Replacing values in existing levels of a multiindex requires replacing the whole level. This can either be accomplished with df.index.set_levels (easier way IMO) or pd.MultiIndex.from_arrays. Depending on the exact use case map and/or replace might be of use. Check out this SO answer for some other examples.
df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')
4) I saw some slicing happening using idx = pd.IndexSlice...This raises an error AttributeError: 'list' object has no attribute 'loc'...raises TypeError: unhashable type: 'slice'
For the AttributeError: 'list' object has no attribute 'loc'
and TypeError: unhashable type: 'slice'
errors you just have two assignments in those lines.
It appears like your .loc syntax is correct except that you can't assign pd.Series this way without causing the cell value to be NaN (see answer to 2) for correct syntax). This works:
idx = pd.IndexSlice
df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"
Upvotes: 3