Tom Mori
Tom Mori

Reputation: 191

Pandas TypeError: Could not convert to numeric

I am working on a project where I imported data from SQL into a pandas DataFrame. This seems to go swimmingly, but when I take the pandas.mean() it throws a TypeError saying that a concatenated list of the values cannot be converted to numeric (see below):

Example Dataframe:

  ProductSKU OverallHeight
0   AAI2185           74.5
1   AAI2275             47
2   AAI2686           56.5
3  AASA1002          73.23

Function Call:

avgValue = df["OverallHeight"].dropna().mean()

Console Output:

Traceback (most recent call last):
  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 5310, in stat_func
    numeric_only=numeric_only)
  ...
  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 293, in nanmean
    the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
  File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 743, in _ensure_numeric
    raise TypeError('Could not convert %s to numeric' % str(x))
TypeError: Could not convert 74.54756.573.23 to numeric

The strangest thing is that it works perfectly fine when I import the same data through a CSV. It only breaks when I load it through SQL, could there be something I did incorrectly there?

Upvotes: 15

Views: 41148

Answers (4)

Gonçalo Peres
Gonçalo Peres

Reputation: 13622

As the console output reveals, there's a problem with the dataframe column 'OverallHeight'.

My guess, as Warren Weckesser commented, is that the column contains strings. In order to check the data types of that column run

print(df['OverallHeight'].dtype) 

Assuming that the above is true, converting the column data type to float should solve the problem. For that use to_numeric

df["OverallHeight"] = pd.to_numeric(df["OverallHeight"], downcast="float")

Upvotes: 3

cottontail
cottontail

Reputation: 23439

If you got a similar TypeError after a groupby operation (e.g. TypeError: Could not convert ace to numeric), then you probably have pandas>=2.0.

groupby.mean() has numeric_only= argument whose default value was True in the past but since pandas 2.0, its default value is False. An implication is that string columns are not dropped when a statistical method such as mean or std is called on the groupby object (as was done in the past). To solve the issue, pass numeric_only=True.

An example that shows the problem and the solution.

import pandas as pd

df = pd.DataFrame({
    "Grouper": ["A", "B", "A", "B", "A"],
    "Name": ["a", "b", "c", "d", "e"],
    "Value": [0.95, 0.25, 0.25, 0.10, 1.00]
})

grouped = df.groupby("Grouper").mean()
# TypeError: Could not convert ace to numeric

grouped = df.groupby("Grouper").mean(numeric_only=True)
# OK

Upvotes: 25

Ayush Agarwal
Ayush Agarwal

Reputation: 1

This will helps to convert obj to float type.

df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce')

Upvotes: -1

seconded
seconded

Reputation: 1

Chances are ToptoBottom's contents are being read as a string. If you are opening a file with pythons read mode, and splitting the string based on new lines and commas the numbers are converted to string, this is why panda cant read the data or use it as a mean. Thus you get the error:

TypeError: Could not convert 74.54756.573.23 to numeric

to remove the conversion issue, you want to convert it to numeric values, you can use pandas to_numeric() function

Upvotes: -2

Related Questions