Reputation: 31526
I am following a YouTube tutorial and I wrote this code from the tutorial
import numpy as np
import pandas as pd
from scipy.stats import percentileofscore as score
my_columns = [
'Ticker',
'Price',
'Number of Shares to Buy',
'One-Year Price Return',
'One-Year Percentile Return',
'Six-Month Price Return',
'Six-Month Percentile Return',
'Three-Month Price Return',
'Three-Month Percentile Return',
'One-Month Price Return',
'One-Month Percentile Return'
]
final_df = pd.DataFrame(columns = my_columns)
# populate final_df here....
pd.set_option('display.max_columns', None)
print(final_df[:1])
time_periods = ['One-Year', 'Six-Month', 'Three-Month', 'One-Month']
for row in final_df.index:
for time_period in time_periods:
change_col = f'{time_period} Price Return'
print(type(final_df[change_col]))
percentile_col = f'{time_period} Percentile Return'
print(final_df.loc[row, change_col])
final_df.loc[row, percentile_col] = score(final_df[change_col], final_df.loc[row, change_col])
print(final_df)
It prints my data frame as
| Ticker | Price | Number of Shares to Buy | One-Year Price Return | One-Year Percentile Return | Six-Month Price Return | Six-Month Percentile Return | Three-Month Price Return | Three-Month Percentile Return | One-Month Price Return | One-Month Percentile Return |
|--------|---------|-------------------------|------------------------|----------------------------|------------------------|-----------------------------|--------------------------|-------------------------------|-------------------------|------------------------------|
| A | 120.38 | N/A | 0.437579 | N/A | 0.280969 | N/A | 0.198355 | N/A | 0.0455988 | N/A |
But when I call the score function I get this error
<class 'pandas.core.series.Series'>
0.4320217937551543
Traceback (most recent call last):
File "program.py", line 72, in <module>
final_df.loc[row, percentile_col] = score(final_df[change_col], final_df.loc[row, change_col])
File "/Users/abhisheksrivastava/Library/Python/3.7/lib/python/site-packages/scipy/stats/stats.py", line 2017, in percentileofscore
left = np.count_nonzero(a < score)
TypeError: '<' not supported between instances of 'NoneType' and 'float'
What is going wrong? I see the same code work in the YouTube video. I have next to none experience with Python
Edit:
I also tried
print(type(final_df['One-Year Price Return']))
print(type(final_df['Six-Month Price Return']))
print(type(final_df['Three-Month Price Return']))
print(type(final_df['One-Month Price Return']))
for row in final_df.index:
final_df.loc[row, 'One-Year Percentile Return'] = score(final_df['One-Year Price Return'], final_df.loc[row, 'One-Year Price Return'])
final_df.loc[row, 'Six-Month Percentile Return'] = score(final_df['Six-Month Price Return'], final_df.loc[row, 'Six-Month Price Return'])
final_df.loc[row, 'Three-Month Percentile Return'] = score(final_df['Three-Month Price Return'], final_df.loc[row, 'Three-Month Price Return'])
final_df.loc[row, 'One-Month Percentile Return'] = score(final_df['One-Month Price Return'], final_df.loc[row, 'One-Month Price Return'])
print(final_df)
but it still gets the same error
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
Traceback (most recent call last):
File "program.py", line 71, in <module>
final_df.loc[row, 'One-Year Percentile Return'] = score(final_df['One-Year Price Return'], final_df.loc[row, 'OneYear Price Return'])
File "/Users/abhisheksrivastava/Library/Python/3.7/lib/python/site-packages/scipy/stats/stats.py", line 2017, in percentileofscore
left = np.count_nonzero(a < score)
TypeError: '<' not supported between instances of 'NoneType' and 'float'
Upvotes: 11
Views: 5752
Reputation: 672
Basically i converted the series to float and set the default to 0 if the conversion failed as follows
mementum = ['One-Year',
'Six-Month',
'Three-Month',
'One-Month'
]
for period in mementum:
hq_df[f'{period} Price Return'] = hq_df[f'{period} Price Return'].astype(float).fillna(0.0)
for row in hq_df.index:
for period in mementum:
hq_df.loc[row, f'{period} Return Percentile'] = stats.percentileofscore(hq_df[f'{period} Price Return'] , hq_df.loc[row, f'{period} Price Return'] )
Upvotes: 0
Reputation: 31
Simply replace None values with 0 as follows,
hqm_dataframe.fillna(0,inplace=True)
Upvotes: 3
Reputation: 1
final_df = pd.DataFrame(columns = my_columns)
for symbol_string in symbol_strings:
batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
data = requests.get(batch_api_call_url).json()
# print(symbol_string.split(','))
# print(data['AAPL']['stats'])
for symbol in symbol_string.split(','):
final_df = final_df.append(
pd.Series(
[
symbol,
data[symbol]['price'],
data[symbol]['stats']['year1ChangePercent'],
np.nan
],
index = my_columns
),
ignore_index=True
)
hqm_df = pd.DataFrame(columns = hqm_columns)
for symbol_string in symbol_strings:
batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
data = requests.get(batch_api_call_url).json()
for symbol in symbol_string.split(','):
hqm_df = hqm_df.append(
pd.Series(
[
symbol,
data[symbol]['price'],
np.nan,
data[symbol]['stats']['year1ChangePercent'],
np.nan,
data[symbol]['stats']['month6ChangePercent'],
np.nan,
data[symbol]['stats']['month3ChangePercent'],
np.nan,
data[symbol]['stats']['month1ChangePercent'],
np.nan
],
index = hqm_columns
),
ignore_index=True
)
hqm_df['One-Year Price Return'] = hqm_df['One-Year Price Return'].astype('float')
hqm_df['Six-Month Price Return'] = hqm_df['Six-Month Price Return'].astype('float')
hqm_df['Three-Month Price Return'] = hqm_df['Three-Month Price Return'].astype('float')
hqm_df['One-Month Price Return'] = hqm_df['One-Month Price Return'].astype('float')
Upvotes: 0
Reputation: 81
Most of the other replies are correct, the issue is that there are None values in the dataframe and the percentileofscore method of scipy stats doesn't know how to parse those. I have a different solution that doesn't involve parsing through every entry on the dataframe.
I used the .replace method of dataframes to replace all the None entries with 0. The inplace = True is there so that the changes are saved to the dataframe instead of having to assign it.
hqm_dataframe.replace([None], 0, inplace = True)
Upvotes: 0
Reputation: 21
After populating final_df
, it's also possible to do:
final_df.fillna(value=0, inplace=True)
If you just want to replace each NaN
by 0.
Upvotes: 2
Reputation: 161
What @Taras Mogetich wrote was pretty correct, however you might need to put the if-statement in its own for-loop. Liko so:
for row in hqm_dataframe.index:
for time_period in time_periods:
change_col = f'{time_period} Price Return'
percentile_col = f'{time_period} Return Percentile'
if hqm_dataframe.loc[row, change_col] == None:
hqm_dataframe.loc[row, change_col] = 0.0
And then separately:
for row in hqm_dataframe.index:
for time_period in time_periods:
change_col = f'{time_period} Price Return'
percentile_col = f'{time_period} Return Percentile'
hqm_dataframe.loc[row, percentile_col] = score(hqm_dataframe[change_col], hqm_dataframe.loc[row, change_col])
Upvotes: 16
Reputation: 61
Funny to google the problem I'm having and it's literally the exact same tutorial you're working through!
As mentioned, some data from the API call has a value of None, which causes an error with the percentileofscore function. My solution is to convert all None type to integer 0 upon initial creation of the hqm_dataframe.
hqm_columns = [
'Ticker',
'Price',
'Number of Shares to Buy',
'One-Year Price Return',
'One-Year Return Percentile',
'Six-Month Price Return',
'Six-Month Return Percentile',
'Three-Month Price Return',
'Three-Month Return Percentile',
'One-Month Price Return',
'One-Month Return Percentile'
]
hqm_dataframe = pd.DataFrame(columns=hqm_columns)
convert_none = lambda x : 0 if x is None else x
for symbol_string in symbol_strings:
batch_api_call_url = f'https://sandbox.iexapis.com/stable/stock/market/batch?symbols={symbol_string}&types=price,stats&token={IEX_CLOUD_API_TOKEN}'
data = requests.get(batch_api_call_url).json()
for symbol in symbol_string.split(','):
hqm_dataframe = hqm_dataframe.append(
pd.Series(
[
symbol,
data[symbol]['price'],
'N/A',
convert_none(data[symbol]['stats']['year1ChangePercent']),
'N/A',
convert_none(data[symbol]['stats']['month6ChangePercent']),
'N/A',
convert_none(data[symbol]['stats']['month3ChangePercent']),
'N/A',
convert_none(data[symbol]['stats']['month1ChangePercent']),
'N/A'
],
index = hqm_columns
),
ignore_index=True
)
Upvotes: 6
Reputation: 136
I'm working through this tutorial as well. I looked deeper into the data in the four '___ Price Return' columns. Looking at my batch API call, there's four rows that have the value 'None' instead of a float which is why the 'NoneError' appears, as the percentileofscore function is trying to calculate the percentiles using 'None' which isn't a float.
To work around this API error, I manually changed the None values to 0 which calculated the Percentiles, with the code below...
time_periods = [
'One-Year',
'Six-Month',
'Three-Month',
'One-Month'
]
for row in hqm_dataframe.index:
for time_period in time_periods:
if hqm_dataframe.loc[row, f'{time_period} Price Return'] == None:
hqm_dataframe.loc[row, f'{time_period} Price Return'] = 0
Upvotes: 12
Reputation: 618
Are you sure that this is the whole code? It returns empty dataframe in my case. Please provide more details
Upvotes: 0