Reputation: 191
I am working on a project where I imported data from SQL into a pandas DataFrame. This seems to go swimmingly, but when I take the pandas.mean()
it throws a TypeError saying that a concatenated list of the values cannot be converted to numeric (see below):
Example Dataframe:
ProductSKU OverallHeight
0 AAI2185 74.5
1 AAI2275 47
2 AAI2686 56.5
3 AASA1002 73.23
Function Call:
avgValue = df["OverallHeight"].dropna().mean()
Console Output:
Traceback (most recent call last):
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\generic.py", line 5310, in stat_func
numeric_only=numeric_only)
...
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 293, in nanmean
the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
File "C:\Program Files\Anaconda\lib\site-packages\pandas\core\nanops.py", line 743, in _ensure_numeric
raise TypeError('Could not convert %s to numeric' % str(x))
TypeError: Could not convert 74.54756.573.23 to numeric
The strangest thing is that it works perfectly fine when I import the same data through a CSV. It only breaks when I load it through SQL, could there be something I did incorrectly there?
Upvotes: 15
Views: 41148
Reputation: 13622
As the console output reveals, there's a problem with the dataframe column 'OverallHeight'
.
My guess, as Warren Weckesser commented, is that the column contains strings. In order to check the data types of that column run
print(df['OverallHeight'].dtype)
Assuming that the above is true, converting the column data type to float should solve the problem. For that use to_numeric
df["OverallHeight"] = pd.to_numeric(df["OverallHeight"], downcast="float")
Upvotes: 3
Reputation: 23439
If you got a similar TypeError after a groupby operation (e.g. TypeError: Could not convert ace to numeric
), then you probably have pandas>=2.0.
groupby.mean()
has numeric_only=
argument whose default value was True in the past but since pandas 2.0, its default value is False. An implication is that string columns are not dropped when a statistical method such as mean
or std
is called on the groupby object (as was done in the past). To solve the issue, pass numeric_only=True
.
An example that shows the problem and the solution.
import pandas as pd
df = pd.DataFrame({
"Grouper": ["A", "B", "A", "B", "A"],
"Name": ["a", "b", "c", "d", "e"],
"Value": [0.95, 0.25, 0.25, 0.10, 1.00]
})
grouped = df.groupby("Grouper").mean()
# TypeError: Could not convert ace to numeric
grouped = df.groupby("Grouper").mean(numeric_only=True)
# OK
Upvotes: 25
Reputation: 1
This will helps to convert obj to float type.
df['column_name'] = pd.to_numeric(df['column_name'], errors='coerce')
Upvotes: -1
Reputation: 1
Chances are ToptoBottom's contents are being read as a string. If you are opening a file with pythons read mode, and splitting the string based on new lines and commas the numbers are converted to string, this is why panda cant read the data or use it as a mean. Thus you get the error:
TypeError: Could not convert 74.54756.573.23 to numeric
to remove the conversion issue, you want to convert it to numeric values, you can use pandas to_numeric() function
Upvotes: -2