Reputation:
The test.csv likes this:
device_id,upload_time
12345678901,2020-06-01 07:40:20+00:00
123456,2020-06-01 07:40:40+00:00
123456,2020-06-01 07:41:00+00:00
123456,2020-06-01 07:41:02+00:00
123456,2020-06-01 07:41:04+00:00
123456,2020-06-01 07:41:08+00:00
12345678901,2020-06-01 07:41:10+00:00
12345678901,2020-06-01 07:41:18+00:00
12345678901,2020-06-01 07:41:20+00:00
,2020-06-01 07:41:24+00:00
,2020-06-01 07:41:40+00:00
12345678901,2020-06-01 07:42:00+00:00
12345678901,2020-06-01 07:42:20+00:00
12345678901,2020-06-01 07:42:22+00:00
12345678901,2020-06-01 07:42:24+00:00
12345678901,2020-06-01 07:42:26+00:00
12345678901,2020-06-01 07:42:28+00:00
12345678901,2020-06-01 07:42:40+00:00
1234,2020-06-01 07:43:00+00:00
1234,2020-06-01 07:43:12+00:00
You can convert deviceid to int
or str
, no problem.
I use this code to get new dataframe.
import pandas as pd
df = pd.read_csv(r'test.csv', encoding='utf-8', parse_dates=[1])
df = df[pd.notnull(df['device_id'])] #Delete rows where device_id is null.
a = df[df['device_id'].map(len)!=11] #Get data whose device_id length is not 11.
b = df[df['device_id'].map(len)==11] #Get data whose device_id length is 11.
But the error message is:
TypeError: object of type 'float' has no len()
Where is wrong?
Upvotes: 1
Views: 1125
Reputation: 1658
For the input file that you have specified, it looks like the device_id
column is considered as a float
datatype for some reason, although all values are int
type. You will face an issue while trying to calculate the length due to this:
Example:
len('12345')
#will give you len = 5, which is the correct length
whereas,
len('12345.0')
#will give you len = 7, which is wrong since it considers the decimal point too
So it is better to convert your datatype to int
and then perform the length check on the str
version of the int
column as below:
Reference:
The len argument may be a sequence (string, tuple or list) or a mapping (dictionary). https://docs.python.org/2/library/functions.html#len
Before calling the len function, you should verify if the argument is one of this type. You can call the method isinstance() to verify it. Take a look on how to use it. https://docs.python.org/2/library/functions.html#isinstance
So try this,
import pandas as pd
df = pd.read_csv(r'sample.csv', parse_dates=[1])
df = df[pd.notnull(df['device_id'])] #Delete rows where device_id is null.
#Convert to int
df['device_id'] = df['device_id'].astype(float).astype(int)
#len function cannot be computed on an int column directly. You should convert to str and then compute len
a = df[df['device_id'].astype(str).map(len)!=11]
b = df[df['device_id'].astype(str).map(len)==11]
Upvotes: 1
Reputation: 2615
Below code would help you
Converting the float value into string will help to know the number of digits.
import pandas as pd
df = pd.read_csv(r'test.csv', encoding='utf-8', parse_dates=[1])
# to remove the null(nan)
df = df.dropna()
or
df = df[df['device_id'].isnull()==False]
or
df = df[df['device_id'].isna()==False]
a = df[df['device_id'].astype(str).map(len)!=11]
b = df[df['device_id'].astype(str).map(len)==11]
another approach
a = df[df['device_id'].astype(str).str.len()!=11]
b = df[df['device_id'].astype(str).str.len()==11]
another approach
a = df[df['device_id'].astype(str).apply(len)!=11]
b = df[df['device_id'].astype(str).apply(len)==11]
Upvotes: 0