Reputation: 595
I'm working scraping project and I'm working with Postgres. I've set up all tables by hand which include the typical data types like datetime, integer, text and floats.
But I run into an issue. Since I scrape many fields, there are always some fields missing and they become nan. And since nan is type float I receive an error when I'm trying to push a float value into a Column which expects a, for instance, datetime or text object.
At least I believe so. The error message isn't very clear.
I'm basically using Pandas only (df.to_sql
) after I've created the tables by SQLAlchemy's declarative_base
.
My process is like:
obj = Scrape() # Scrape() includes all relevant methods
obj.get_data()
obj.create_a_df()
obj.df.to_sql('table', con=engine, [...])
As soon as a value within the current scrape (obj.get_data()
) doesn't exist, it becomes nan
. And since nan
has type float, I can't post it into the table which expect anything else but float.
Error:
[SQL: INSERT INTO company_statistics (my column name) VALUES (my df_columns)] [parameters:
{my data as dictionary}]
(Background on this error at: https://sqlalche.me/e/14/9h9h)
Can't diagnose it very well, but I expect it to be the described reason above. Is there any way around? Does it make sense to df.fillna(None)
or is there any keyword available I'm not aware of?
Upvotes: 0
Views: 1549
Reputation: 682
You can also consider obj.df.dropna(inplace=True)
. This will simply drop all rows that contain NaN
in any of the columns.
Upvotes: 1
Reputation: 142
Wouldn't getting rid of the NaN
s fix this? Using .fillna()
would replace all of the NaN
values with whatever you declare inside. You could put this within your scraping method.
Upvotes: 0