Rutger Hofste
Rutger Hofste

Reputation: 4373

How to handle NaNs in pandas dataframe integer column to postgresql database

I have a pandas dataframe with a "year" column. However some rows have a np.NaN value due to an outer merge. The data type of the column in pandas is therefore converted to float64 instead of integer (integer cannot store NaNs?). Next, I want to store the dataframe on a postGreSQL database. For this I use:

df.to_sql()

Everything works fine but my postGreSQL column is now type "double precision" and the np.NaN values are now [null]. This all makes sense since the input column type was float64 and not integer type.

I was wondering if there is a way to store the results in an integer type column with [nans].

Example Notebook

Result of Ami's answer:

enter image description here

Upvotes: 6

Views: 4452

Answers (2)

Mesut Aslan
Mesut Aslan

Reputation: 31

You should use it;

df.year = df.year.fillna(-1) OR 0

Upvotes: 1

Ami Tavory
Ami Tavory

Reputation: 76297

(integer cannot store NaNs?)

No, they cannot. If you look at the postgresql numeric documentation, you can see that the number of bytes, and ranges, are completely specified, and integers cannot store this.

A common solution in this case is to decide, by convention, that some number is logically a nan. In your case, if it is year, you might choose a negative value (or just -1) as that. Before writing, you could use

df.year = df.year.fillna(-1).astype(int)

Alternatively, you can define another column as year_is_none.

Alternatively, you can store them as floats.

These solutions range from most efficient, to least efficient in terms of memory.

Upvotes: 6

Related Questions