Reputation: 47
I have a dataframe with a multiindex, and I need to drop the first row from each piece of the first index.
cshoc prccd mkt_val return
gvkey datadate
001075 2019-01-29 112080000.0 87.36 9.791309e+09 0.006916 # <-- DROP THIS ROW
2019-01-30 112080000.0 87.62 9.820450e+09 0.002976
2019-01-31 112080000.0 88.12 9.876490e+09 0.005706
2019-02-01 112080000.0 87.65 9.823812e+09 -0.005334
2019-02-04 112080000.0 87.53 9.810362e+09 -0.001369
... ... ... ... ...
316056 2020-01-21 92917000.0 127.14 1.181347e+10 -0.007107 # <-- AND THIS ROW
2020-01-22 92917000.0 129.45 1.202811e+10 0.018169
2020-01-23 92917000.0 129.80 1.206063e+10 0.002704
2020-01-24 92917000.0 128.00 1.189338e+10 -0.013867
2020-01-27 92917000.0 127.67 1.186271e+10 -0.002578`
For example, the code should drop the rows at datadate = '2019-01-29' and '2020-01-21'. The first row will always have a different value of the index 'datadate', so I can't choose specific values to drop.
I have tried methods like .iloc[:,1:]
, .iloc[:,:,1:]
, and things with IndexSlice. So far, nothing has worked.
Upvotes: 0
Views: 60
Reputation: 25239
You may also try this
df[df.index.get_level_values(0).duplicated()]
Out[262]:
cshoc prccd mkt_val return
gvkey datadate
1075 2019-01-30 112080000.0 87.62 9.820450e+09 0.002976
2019-01-31 112080000.0 88.12 9.876490e+09 0.005706
2019-02-01 112080000.0 87.65 9.823812e+09 -0.005334
2019-02-04 112080000.0 87.53 9.810362e+09 -0.001369
316056 2020-01-22 92917000.0 129.45 1.202811e+10 0.018169
2020-01-23 92917000.0 129.80 1.206063e+10 0.002704
2020-01-24 92917000.0 128.00 1.189338e+10 -0.013867
2020-01-27 92917000.0 127.67 1.186271e+10 -0.002578`
Note: as @piRSquared's comment, on one-row group of index level=0, this solution will not drop it. It still keeps that one row while @piRSquared's solution will drop that group completely.
Upvotes: 1