Reputation: 113
Setting the value of a new dataframe column:
df.loc[df["Measure] == metric.label, "source_data_url"] = metric.source_data_url
now (as of Pandas version 2.1.0) gives a warning,
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value ' metric_3' has dtype incompatible with float64, please explicitly cast to a compatible dtype
first.
The Pandas documentation discusses how the problem can be solved for a Series but it is not clear how to do this iteratively (the line above is called in a loop over metrics and it's the final metric that gives the warning) when assigning a new DataFrame column. How can this be done?
Upvotes: 11
Views: 48390
Reputation: 1
I get this issue when reading in a DataFrame, then if a condition exits, update a cell by row/column. I solved this with reading in the DataFrame with the argument dtype='object'
Full version:
the_df = pd.read_excel('location_to_file.xlsx',dtype='object')
Upvotes: 0
Reputation: 3031
My workaround was casting the main Series
/DataFrame
data to object
before doing the operations on it; do the operations, then cast it back to the appropriate type.
df = df.astype(object)
# [...] Do the operation here
df = df.astype(<type>)
I thought simply casting the result of the operation to the appropriate type would solve the problem but it didn't.
Upvotes: 0
Reputation: 91
Since Pandas 2.1.0 setitem-like operations on Series (or DataFrame columns) which silently upcast the dtype are deprecated and show a warning.
In a future version, these will raise an error and you should cast to a common dtype first.
Previous behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
New behavior:
In [1]: ser = pd.Series([1, 2, 3])
In [2]: ser
Out[2]:
0 1
1 2
2 3
dtype: int64
In [3]: ser[0] = 'not an int64'
FutureWarning:
Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas.
Value 'not an int64' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
In [4]: ser
Out[4]:
0 not an int64
1 2
2 3
dtype: object
To retain the current behaviour, you could cast ser to object dtype first:
In [21]: ser = pd.Series([1, 2, 3])
In [22]: ser = ser.astype('object')
In [23]: ser[0] = 'not an int64'
In [24]: ser
Out[24]:
0 not an int64
1 2
2 3
dtype: object
Upvotes: 3
Reputation: 474
I had the same problem. My intuition of this is that when you are setting value for the first time to the column source_data_url
, the column does not yet exists, so pandas creates a column source_data_url
and assigns value NaN
to all of its elements. This makes Pandas think that the column's dtype
is float64
. Then it raises this warning.
My solution was to create the column with some default value, e.g. empty string, before adding values to it:
df["source_data_url"] = ""
or None
seems also to work:
df["source_data_url"] = None
Upvotes: 10