Reputation: 4864
Suppose I have a collection of stock events stored as a dataframe, which looks like
date ticker price
0 2017-1-2 'AAPL' 130.00
1.2017-1-2 'ZNGA' 2.82
(etc)
And I want to select only those rows corresponding to stocks in the S&P500.
The obvious way is to create a dictionary sp500dict
whose keys are S&P500 names, and then do something like df[df['ticker'] in sp500dict]
. However, this (and a number of other schemes I had tried) fails, in this case as follows:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Any suggestions? There is a horrible kludge with creating a dataframe whose rows contain the elements in the dictionary and then doing a join, but that seems a little extreme.
Upvotes: 0
Views: 1128
Reputation: 210912
Try this:
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = pd.read_html(url)[0].iloc[1:, 0].str.replace('\.', '-')
In [66]: df[df['ticker'].isin(sp500)]
Out[66]:
date ticker price
0 2017-1-2 AAPL 130.0
Timing for 200.000 rows DF:
In [102]: df = pd.concat([df] * 10**5, ignore_index=True)
In [103]: df.shape
Out[103]: (200000, 3)
In [104]: s = sp500.to_frame('ticker')
In [105]: %timeit df[df['ticker'].isin(sp500)]
10 loops, best of 3: 42.4 ms per loop
In [106]: %timeit pd.merge(df, s)
10 loops, best of 3: 50.2 ms per loop
Upvotes: 1