Reputation: 618
it seems that the data source I'm pulling from (API) has a weird '-' symbol that isn't recognized when I do str.replace. Here's the code and the library I used. Error occurs on pd.to_numeric. Casting as float returns the same error without the position.
Y = xy['QPerf'].str.rstrip('%')
Y = Y.str.replace('-', '-')
Y = pd.to_numeric(Y)
Y = Y.apply(lambda x: 1 if x > 0 else 0)
print(Y)
I have tried str.encode('UTF-8').str.decode('UTF-8')
but unsurprisingly it doesn't work.
Here is the library code to get your own data to try this on.
from finvizfinance.quote import finvizfinance
from finvizfinance.screener.overview import Overview
stock = finvizfinance('TSLA')
stock_fundament = stock.TickerFundament()
qperf = stock_fundament['Perf Quarter']
This will return a dataframe.
Upvotes: 0
Views: 1540
Reputation: 11395
You can always ignore errors and replace with NaN
s in pd.to_numeric
using the errors='coerce'
parameter. That’s likely what -
means too, it’s not a number, it’s representing missing data.
Y = pd.to_numeric(xy['QPer'].str.rstrip('%'), errors='coerce')
This has the downside of also ignoring any other errors, and maybe make you miss formatting errors that you would like to know about.
If you were reading from a csv file, you could use na_values
to specify that -
mean NaN
s. In this context we can use .mask()
to replace the -
with NaN
s, and then use to_numeric
:
Y = pd.to_numeric(xy['QPer'].str.rstrip('%').mask(xy['QPer'] == '-'))
Upvotes: 1