Reputation: 3852
I have a df like this:
num1 num2
0 [2.0] 10
1 [3.0] 20
2 [4.0] 30
3 [5.0] 40
4 [6.0] 50
5 [nan] 60
6 [nan] 70
7 [10.0] 80
8 [nan] 90
9 [15.0] 100
num1
column contains arrays of floats. [nan]
is a numpy array containing a single np.NaN
.
I am converting this to integers via this:
df['num1'] = list(map(int, df['num1']))
If I just use this df:
num1 num2
0 [2.0] 10
1 [3.0] 20
2 [4.0] 30
3 [5.0] 40
4 [6.0] 50
This works when there are no [nan]
and I get:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
But if I include the full df with [nan]
I get the error:
`ValueError: cannot convert float NaN to integer`
I tried doing:
df[df['num1'] != np.array(np.NaN)]
But this gave the error:
TypeError: len() of unsigned object
How can I get the desired output:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
5 10.0 80
6 15.0 100
Upvotes: 2
Views: 202
Reputation: 6376
df['num1'] = df.num1.str[0]
df.dropna(axis=0, inplace=True)
A solution inspired by suleiman answer but without using loc And here is the output :
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
7 10.0 80
9 15.0 100
Upvotes: 0
Reputation: 9081
Try this -
df['num1'] = df['num1'].apply(lambda x: x[0]).dropna() # unlist the list of numbers (assuming you dont have multiple)
df['num1'] = list(map(int, df['num1'])) # map operation
print(df)
Output
num1 num2
0 2 10
1 3 20
2 4 30
3 5 40
4 6 50
7 10 80
9 15 100
Timings (depends on size of data)
# My solution
# 2.6 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# @O.Suleiman's solution
# 2.8 ms ± 457 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# @ Anton vBR's solution
# 2.96 ms ± 504 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 0
Reputation: 918
This should get rid of all those nan
lists, just add the following:
df = df.loc[df['num1'].str[0].dropna().index]
Then you can run the rest of your code as it is.
Upvotes: 2
Reputation: 18906
As you can see there are many options. You can convert to numeric and then remove nulls:
import pandas as pd
import numpy as np
data = dict(num1=[[2.0],[np.nan],['apple']])
df = pd.DataFrame(data)
m = pd.to_numeric(df['num1'].apply(lambda x: x[0]),errors='coerce').dropna().index
df = df.loc[m]
Upvotes: 0
Reputation: 13401
You can do it as below:
# convert np array containing NaNs into np.NaN
df['num1']=df['num1'].apply(lambda x: np.nan if np.nan in x else x[0])
# use dropna to drop the rows
df=df['num1'].dropna()
print(df)
Output:
num1 num2
0 2.0 10
1 3.0 20
2 4.0 30
3 5.0 40
4 6.0 50
5 10.0 80
6 15.0 100
Upvotes: 0