Rabe
Rabe

Reputation: 87

Fetch rows where certain criteria meets in pandas dataframe

I need to fetch the rows per code (where serial is 2) where value of new (where serial is 2) >= value of sold (where serial is 1) and value of new (where serial is 2) < value of sold (where serial is 2)

Formula: new[2] >= sold[1] and new[2] < sold[2] , [2]/[1] are the serial (and thats why I tried to put the index on serial).

Sample Dataframe (data): data.set_index('serial')

code serial date new sold
0 20113 1 2019-01-30 1344.4 1344.95
1 20113 2 2019-02-30 1345.35 1344.9
2 20113 3 2019-03-30 1347.4 1345.35
3 20286 1 2019-01-30 590.55 590.15
4 20286 2 2019-02-30 590.15 590.55
5 20286 3 2019-03-30 590.4 590.15

It gives an error:

File "C:#########\Python39\lib\site-packages\pandas\core\indexes\base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: True

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "F:\python_projects#########\testing.py", line 12, in status = globals()[f"strategy_{row[0]}"].pre_check_condition_panda(5,now)#(row[2],now) File "F:\python_projects#########\strategy.py", line 88, in pre_check_condition_panda data1 = data[(data.open[2] >= data.close[1]) & (data.open[2] > data.close[2])] File "C:#########\Python39\lib\site-packages\pandas\core\frame.py", line 3455, in getitem indexer = self.columns.get_loc(key) File "C:#########\Python39\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: True

My code:

data1 = data[(data.new[2] >= data.sold[1]) & (data.new[2] < data.sold[2])]
print(data1)

Expected Result:

code serial date new sold
4 20286 2 2019-02-30 590.15 590.55

Upvotes: 0

Views: 778

Answers (1)

mozway
mozway

Reputation: 262304

Using stack/unstack you can get the "code" that matches the requirement, but the condition to get only serial == 2 is unclear:

df2 = df.set_index(['code', 'serial']).unstack()
(df2.loc[df2[('new', 2)].ge(df2[('sold', 1)])
        &df2[('new', 2)].lt(df2[('sold', 2)])
        ]
    .stack(level=1)
    .reset_index()
)

output:

    code  serial        date     new    sold
0  20286       1  2019-01-30  590.55  590.15
1  20286       2  2019-02-30  590.15  590.55
2  20286       3  2019-03-30  590.40  590.15

If you only want serial == 2, you can add .query('serial == 2'):

df2 = df.set_index(['code', 'serial']).unstack()
(df2.loc[df2[('new', 2)].ge(df2[('sold', 1)])
        &df2[('new', 2)].lt(df2[('sold', 2)])
        ]
    .stack(level=1)
    .reset_index()
    .query('serial == 2')
)

output:

    code  serial        date     new    sold
1  20286       2  2019-02-30  590.15  590.55

Upvotes: 2

Related Questions