selecting rows in a pandas dataframe based on dictionary membership

Question

Suppose I have a collection of stock events stored as a dataframe, which looks like

  date     ticker price
0 2017-1-2 'AAPL' 130.00
1.2017-1-2 'ZNGA' 2.82

(etc)

And I want to select only those rows corresponding to stocks in the S&P500. The obvious way is to create a dictionary sp500dict whose keys are S&P500 names, and then do something like df[df['ticker'] in sp500dict]. However, this (and a number of other schemes I had tried) fails, in this case as follows:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any suggestions? There is a horrible kludge with creating a dataframe whose rows contain the elements in the dictionary and then doing a join, but that seems a little extreme.

MaxU - stand with Ukraine · Accepted Answer

Try this:

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = pd.read_html(url)[0].iloc[1:, 0].str.replace('\.', '-')


In [66]: df[df['ticker'].isin(sp500)]
Out[66]:
       date ticker  price
0  2017-1-2   AAPL  130.0

Timing for 200.000 rows DF:

In [102]: df = pd.concat([df] * 10**5, ignore_index=True)

In [103]: df.shape
Out[103]: (200000, 3)

In [104]: s = sp500.to_frame('ticker')

In [105]: %timeit df[df['ticker'].isin(sp500)]
10 loops, best of 3: 42.4 ms per loop

In [106]: %timeit pd.merge(df, s)
10 loops, best of 3: 50.2 ms per loop

selecting rows in a pandas dataframe based on dictionary membership

Answers (1)

Related Questions