Igor K.
Igor K.

Reputation: 877

Ignore lines in pandas DataFrame

I have a list called reassembly organized like this:

['AFLT', 228468.0, 'B'],
['TATN', 1108.6, 'B'],
['TATN', 4434.4, 'B'],
['MOEX', 3480.0, 'S'],
['YNDX', 5934.0, 'B'],
['MTSS', 36003.0, 'S'],
['SBERP', 33837.1, 'S'],
['SBERP', 1780.8, 'S'],
['MTSS', 3273.0, 'S'],
['AFLT', 124356.0, 'B'],
['AFLT', 20244.0, 'B'],
['MGNT', 72990.0, 'B'],
['NLMK', 230917.0, 'B'],
['NLMK', 156050.0, 'B'],
['NLMK', 31220.0, 'B'],
['MGNT', 36450.0, 'S'],
['TCSG', 14045.2, 'S'],
['TCSG', 2160.4, 'S'],

Also there is a dictionary called medians with data:

{'TATNP': 11968.05, 'TCSG': 8647.2, 'TRNFP': 130250.0, 'UPRO': 7941.0, 'VTBR': 3828.28, 'YNDX': 17660.4}

Keys in dictionary are equivalent to first values in list ( 'AFLT', 'VTBR' and others)

I convert reassembly to pandas:

df = pd.DataFrame(reassembly, columns=['ticker','vol','operation'])

Now I want to do something like this:

df = df[df['vol'] < median['ticker']]

I mean if vol < median for this ticker script should ignore it.

Help me please to write this code correctly.

Upvotes: 3

Views: 255

Answers (3)

Richard
Richard

Reputation: 1

I suggest solving this with a list comprehension and pipe the result into panda instead.

reassembly = [['AFLT', 228468.0, 'B'],
['TATN', 1108.6, 'B'],
['TATN', 4434.4, 'B'],
['MOEX', 3480.0, 'S'],
['YNDX', 5934.0, 'B'],
['MTSS', 36003.0, 'S'],
['SBERP', 33837.1, 'S'],
['SBERP', 1780.8, 'S'],
['MTSS', 3273.0, 'S'],
['AFLT', 124356.0, 'B'],
['AFLT', 20244.0, 'B'],
['MGNT', 72990.0, 'B'],
['NLMK', 230917.0, 'B'],
['NLMK', 156050.0, 'B'],
['NLMK', 31220.0, 'B'],
['MGNT', 36450.0, 'S'],
['TCSG', 14045.2, 'S'],
['TCSG', 2160.4, 'S']]

medians = {'TATNP': 11968.05, 'TCSG': 8647.2, 'TRNFP': 130250.0, 'UPRO': 7941.0, 'VTBR': 3828.28, 'YNDX': 17660.4}

ready_for_panda = [x for x in reassembly if x[0] in medians and x[1] > medians[x[0]]]

pd.DataFrame(ready_for_panda, columns=["ticker", "vol", "operation"])

ticker  vol      operation
TCSG    14045.2  S

I have assumed that you want to filter out any element from reassembly where the volume is less than the current median for this ticker.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150765

You want map:

high_volumes = df[df['vol'] > df['ticker'].map(medians)]

# do suff with high volume transaction

Note that the above can fail if you don't have all the tickers in medians. In which case, let say you want to keep all those tickers that are not in medians:

meds = df['ticker'].map(medians)
high_volumes = df[(df['vol']>meds)|(meds.isna())]

Upvotes: 4

Randy
Randy

Reputation: 14847

df = df[df['vol'] > df['ticker'].map(median)]

Upvotes: 2

Related Questions