Reputation: 701
I have a multindex dataframe with 3 index levels and 2 numerical columns.
A 1 2017-04-01 14.0 87.346878
2017-06-01 4.0 87.347504
2 2014-08-01 1.0 123.110001
2015-01-01 4.0 209.612503
B 3 2014-07-01 1.0 68.540001
2014-12-01 1.0 64.370003
4 2015-01-01 3.0 75.000000
I want to replace the values in first row of 3rd index level wherever a new second level index begins. For ex: every first row
(A,1,2017-04-01)->0.0 0.0
(A,2,2014-08-01)->0.0 0.0
(B,3,2014-07-01)->0.0 0.0
(B,4,2015-01-01)->0.0 0.0
The dataframe is too big and doing it datframe by dataframe like df.xs('A,1')...df.xs(A,2)
gets time consuming. Is there some way where i can get a mask and replace with new values in these positions ?
Upvotes: 5
Views: 2120
Reputation: 30579
You can use the grouper indices
in a simple iloc
:
df.iloc[[a[0] for a in df.groupby(level=[0, 1]).indices.values()]] = 0
Example:
df = pd.DataFrame({'col1': [14., 4., 1., 4., 1., 1., 3.],
'col2': [ 87.346878, 87.347504, 123.110001, 209.612503, 68.540001, 64.370003, 75.]},
index = pd.MultiIndex.from_tuples(([('A', 1, '2017-04-01'), ('A', 1, '2017-06-01'),
('A', 2, '2014-08-01'), ('A', 2, '2015-01-01'),
('B', 3, '2014-07-01'), ('B', 3, '2014-12-01'),
('B', 4, '2015-01-01')])))
Result:
col1 col2
A 1 2017-04-01 0.0 0.000000
2017-06-01 4.0 87.347504
2 2014-08-01 0.0 0.000000
2015-01-01 4.0 209.612503
B 3 2014-07-01 0.0 0.000000
2014-12-01 1.0 64.370003
4 2015-01-01 0.0 0.000000
Timings:
%%timeit
idx = df.reset_index(level=2).groupby(level=[0, 1])['level_2'].first()
idx = pd.MultiIndex.from_arrays(idx.reset_index().to_numpy().T)
df.loc[idx, :] = 0
#6.7 ms ± 40 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df.iloc[[a[0] for a in df.groupby(level=[0, 1]).indices.values()]] = 0
#897 µs ± 6.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So this is about 7 times faster than the accepted answer
Upvotes: 1
Reputation: 1
I think you can use something like this:
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
df = pd.DataFrame([['A', 'B'], ['bar', 'two'],
['foo', 'one'], ['foo', 'two']],
columns=['first', 'second'])
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df
You can create a list of unique values from your index. Then get the index position, to replace on your column the row value coincidence with the row value.
lst = ['bar','foo', 'qux']
ls = []
for i in lst:
base = df.index.get_loc(i)
a = base.indices(len(df))
a = a[0]
ls.append(a)
for ii in ls:
#print(ii)
df[0][ii] = 0
df
Fortunately, this can help you.
Cheers!
Upvotes: 0
Reputation: 7509
We can extract a series of the second-level index with:
df.index.get_level_values(1)
# output: Int64Index([1, 1, 2, 2, 3, 3, 4], dtype='int64')
And check where it changes with:
idx = df.index.get_level_values(1)
np.where(idx != np.roll(idx, 1))[0]
# output: array([0, 2, 4, 6])
So we can simply use the returned value of the second statement with iloc
to get the first row of every second-level index and modify their values like this:
idx = df.index.get_level_values(1)
df.iloc[np.where(idx != np.roll(idx, 1))[0]] = 0
output:
value1 value2
A 1 2017-04-01 0.0 0.000000
2017-06-01 4.0 87.347504
2 2014-08-01 0.0 0.000000
2015-01-01 4.0 209.612503
B 3 2014-07-01 0.0 0.000000
2014-12-01 1.0 64.370003
4 2015-01-01 0.0 0.000000
Upvotes: 1
Reputation: 71689
Use DataFrame.reset_index
on level=2
, then use DataFrame.groupby
on level=[0, 1]
and aggregate level_2
using first
, then using pd.MultiIndex.from_arrays
create a multilevel index, finally use this multilevel index to change the values in dataframe:
idx = df.reset_index(level=2).groupby(level=[0, 1])['level_2'].first()
idx = pd.MultiIndex.from_arrays(idx.reset_index().to_numpy().T)
df.loc[idx, :] = 0
Result:
# print(df)
col1 col2
A 1 2017-04-01 0.0 0.000000
2017-06-01 4.0 87.347504
2 2014-08-01 0.0 0.000000
2015-01-01 4.0 209.612503
B 3 2014-07-01 0.0 0.000000
2014-12-01 1.0 64.370003
4 2015-01-01 0.0 0.000000
Upvotes: 1