Reputation: 1663
Say I have the following multi-index dataframe:
arrays = [np.array(['bar', 'bar', 'bar', 'bar', 'foo', 'foo', 'foo', 'foo']),
pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04', '2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'])]
df = pd.DataFrame(np.zeros((8, 4)), index=arrays)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
How do I select only the part of this dataframe where the first index level = 'bar'
, and date > 2020.01.02
, such that I can add 1 to this part?
To be clearer, the expected output would be:
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
I managed slicing it according to the first index:
df.loc['bar']
But then I am not able to apply the condition on the date.
Upvotes: 3
Views: 371
Reputation: 3917
Another option is to parametrize IndexSlice
directly without a need to run query
(like in @sammywemmy answer, and you don't need to name your index cols):
idx = pd.IndexSlice
df.loc[idx["bar", dt.datetime(2020, 1, 3):], :] = 1
df
# 0 1 2 3
# bar 2020-01-01 0.0 0.0 0.0 0.0
# 2020-01-02 0.0 0.0 0.0 0.0
# 2020-01-03 1.0 1.0 1.0 1.0
# 2020-01-04 1.0 1.0 1.0 1.0
# foo 2020-01-01 0.0 0.0 0.0 0.0
# 2020-01-02 0.0 0.0 0.0 0.0
# 2020-01-03 0.0 0.0 0.0 0.0
# 2020-01-04 0.0 0.0 0.0 0.0
With this approach, you can easily change your parametrization:
df.loc[idx[("bar", "foo"), dt.datetime(2020, 1, 3):], :] = 1
df.loc[idx["bar", "2020-01-03":], :] = 1
df.loc[idx["bar", dt.datetime(2020, 1, 2):dt.datetime(2020, 1, 3)], :] = 1
df.loc[idx["bar", dt.datetime(2020, 1, 3)], :] = 1
Upvotes: 0
Reputation: 28644
#give your index names :
df.index = df.index.set_names(["names","dates"])
#get the indices that match your condition
index = df.query('names=="bar" and dates>"2020-01-02"').index
#assign 1 to the relevant points
#IndexSlice makes slicing multiindexes easier ... here though, it might be seen as overkill
idx = pd.IndexSlice
df.loc[idx[index],:] = 1
0 1 2 3
names dates
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
Upvotes: 0
Reputation: 862581
Here is possible compare each level and then set 1
, there is :
for all columns in DataFrame.loc
:
m1 = df.index.get_level_values(0) =='bar'
m2 = df.index.get_level_values(1) > '2020-01-02'
df.loc[m1 & m2, :] = 1
print (df)
0 1 2 3
bar 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 1.0 1.0 1.0 1.0
2020-01-04 1.0 1.0 1.0 1.0
foo 2020-01-01 0.0 0.0 0.0 0.0
2020-01-02 0.0 0.0 0.0 0.0
2020-01-03 0.0 0.0 0.0 0.0
2020-01-04 0.0 0.0 0.0 0.0
Upvotes: 3