Reputation: 6020
I have a dataframe with the following dtypes.
> df.dtypes
Col1 float64
Col2 object
dtype: object
When I do the following:
df['Col3'] = df['Col2'].apply(lambda s: len(s) >= 2 and s[0].isalpha())
I get:
TypeError: object of type 'float' has no len()
I believe if I convert "object" to "String", I will get to do what I want. However, when I do the following:
df['Col2'] = df['Col2'].astype(str)
the dtype of Col2
doesn't change. I am a little confused with datatype "object" in Pandas. What exactly is "object"?
More info: This is how Col2
looks like:
Col2
1 F5
2 K3V
3 B9
4 F0V
5 G8III
6 M0V:
7 G0
8 M6e-M8.5e Tc
Upvotes: 28
Views: 121138
Reputation: 23011
If the column dtype is object
, TypeError: object of type 'float' has no len()
often occurs if the column contains NaN. Check if that's the case by calling
df['Col2'].isna().any()
If it returns True
, then there's NaN and you probably need to handle that.
str.
methodsIf null handling is not important, you can also call vectorized str.len()
, str.isdigit()
etc. methods. For example, the code in the OP can be written as:
df['Col3'] = df['Col2'].str.len().ge(2) & df['Col2'].str[0].str.isalpha()
to get the desired output without errors.
Since pandas 1.0, there's a new 'string'
dtype where you can keep a Nullable integer dtype after casting a column into a 'string'
dtype. For example, if you want to convert floats to strings without decimals, yet the column contains NaN values that you want to keep as null, you can use 'string'
dtype.
df = pd.DataFrame({
'Col1': [1.2, 3.4, 5.5, float('nan')]
})
df['Col1'] = df['Col1'].astype('string').str.split('.').str[0]
returns
0 1
1 3
2 5
3 <NA>
Name: Col1, dtype: object
where <NA>
is a Nullable integer that you can drop with dropna()
while df['Col1'].astype(str)
casts NaNs into strings.
Upvotes: 1
Reputation: 14738
If a column contains string or is treated as string, it will have a dtype
of object
(but not necessarily true backward -- more below). Here is a simple example:
import pandas as pd
df = pd.DataFrame({'SpT': ['string1', 'string2', 'string3'],
'num': ['0.1', '0.2', '0.3'],
'strange': ['0.1', '0.2', 0.3]})
print df.dtypes
#SpT object
#num object
#strange object
#dtype: object
If a column contains only strings, we can apply len
on it like what you did should work fine:
print df['num'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
However, a dtype
of object does not means it only contains strings. For example, the column strange
contains objects with mixed types -- and some str
and a float
. Applying the function len
will raise an error similar to what you have seen:
print df['strange'].apply(lambda x: len(x))
# TypeError: object of type 'float' has no len()
Thus, the problem could be that you have not properly converted the column to string, and the column still contains mixed object types.
Continuing the above example, let us convert strange
to strings and check if apply
works:
df['strange'] = df['strange'].astype(str)
print df['strange'].apply(lambda x: len(x))
#0 3
#1 3
#2 3
(There is a suspicious discrepancy between df_cleaned
and df_clean
there in your question, is it a typo or a mistake in the code that causes the problem?)
Upvotes: 38