Reputation: 192
I am new to python. I have a .csv dataset. There is a column called BasePay.
Most of the values in column is type int, but some values are "Not Provided".
I am trying to get mean value of BasePay as:
sal['BasePay'].mean()
But it gives me error of :
TypeError: can only concatenate str (not "int") to str.
I want to omit that string columns. How can i do that?
Thanks.
Upvotes: 1
Views: 2927
Reputation: 61
If you store data from the BasePay column in a list, you can do as follows:
for i in l:
if type(i) == int:
x.append(i)
mean = sum(x) / len(x)
print(mean)
Upvotes: 1
Reputation: 862601
Because some non numeric values use to_numeric
with errors='coerce'
for convert them to NaN
s, so mean
working nice:
out = pd.to_numeric(sal['BasePay'], errors='coerce').mean()
Sample:
sal = pd.DataFrame({'BasePay':[1, 'Not Provided', 2, 3, 'Not Provided']})
print (sal)
BasePay
0 1
1 Not Provided
2 2
3 3
4 Not Provided
print (pd.to_numeric(sal['BasePay'], errors='coerce'))
0 1.0
1 NaN
2 2.0
3 3.0
4 NaN
Name: BasePay, dtype: float64
out = pd.to_numeric(sal['BasePay'], errors='coerce').mean()
print (out)
2.0
Upvotes: 4
Reputation: 11
This problem is because, when you import the dataset, the empty fields will be filled with NaN(pandas), So you have two options 1.Either you convert pandas.nan to 0 or remove the NaN's, by drop.nan
This can also be achieved by using np.nanmean()
Upvotes: 1