Tom
Tom

Reputation: 113

Solving incompatible dtype warning for pandas DataFrame when setting new column iteratively

Setting the value of a new dataframe column:

df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url

now (as of Pandas version 2.1.0) gives a warning,

FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '       metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
 first.

The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it's the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?

Upvotes: 11

Views: 48390

Answers (4)

Allen Shealey
Allen Shealey

Reputation: 1

I get this issue when reading in a DataFrame, then if a condition exits, update a cell by row/column. I solved this with reading in the DataFrame with the argument dtype='object'

Full version:

the_df = pd.read_excel('location_to_file.xlsx',dtype='object')

Upvotes: 0

Maicon Mauricio
Maicon Mauricio

Reputation: 3031

My workaround was casting the main Series/DataFrame data to object before doing the operations on it; do the operations, then cast it back to the appropriate type.

df = df.astype(object)
# [...] Do the operation here
df = df.astype(<type>)

I thought simply casting the result of the operation to the appropriate type would solve the problem but it didn't.

Upvotes: 0

den-kar
den-kar

Reputation: 91

Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.

In a future version, these will raise an error and you should cast to a common dtype first.

Previous behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

New behavior:

In [1]: ser = pd.Series([1, 2, 3])

In [2]: ser
Out[2]:
0    1
1    2
2    3
dtype: int64

In [3]: ser[0] = 'not an int64'
FutureWarning:
  Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
  Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.

In [4]: ser
Out[4]:
0    not an int64
1               2
2               3
dtype: object

To retain the current behaviour, you could cast ser to object dtype first:

In [21]: ser = pd.Series([1, 2, 3])

In [22]: ser = ser.astype('object')

In [23]: ser[0] = 'not an int64'

In [24]: ser
Out[24]: 
0    not an int64
1               2
2               3
dtype: object

Source: https://pandas.pydata.org/docs/dev/whatsnew/v2.1.0.html#deprecated-silent-upcasting-in-setitem-like-series-operations

Upvotes: 3

lutrarutra
lutrarutra

Reputation: 474

I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url, the column does not yet exists, so pandas creates a column source_data_url and assigns value NaN to all of its elements. This makes Pandas think that the column's dtype is float64. Then it raises this warning.

My solution was to create the column with some default value, e.g. empty string, before adding values to it:

df["source_data_url"] = ""

or None seems also to work:

df["source_data_url"] = None

Upvotes: 10

Related Questions