Reputation: 9024
self.df['Regular Price'] = self.df['Regular Price'].apply(
lambda x: int(round(x)) if isinstance(
x, (int, float)) else None
)
The above code is assigning None to every value of field Regular Price
whenever it encounter a non numeric value in the dataframe. I want to assign None
to only that cell where its non number value.
thanks
Upvotes: 2
Views: 889
Reputation: 863216
First is impossible return NaN
s with integers
, because NaN
s is float
by design.
Your solution working if mixed
types - numeric with string
s:
df = pd.DataFrame({
'Regular Price': ['a',1,2.3,'a',7],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = df['Regular Price'].apply(
lambda x: int(round(x)) if isinstance(
x, (int, float)) else None
)
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
But if all data are strings need to_numeric
with errors='coerce'
for convert not numeric to NaN
s:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
EDIT:
I also need to remove floating points and use
int
only
It is possible by convert to None
for NaN
s and cast to int
:
df['Regular Price'] = pd.to_numeric(df['Regular Price'],
errors='coerce').round()
df['Regular Price'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
print (df)
B Regular Price
0 0 None
1 1 1
2 2 2
3 3 None
4 4 7
print (df['Regular Price'].apply(type))
0 <class 'NoneType'>
1 <class 'int'>
2 <class 'int'>
3 <class 'NoneType'>
4 <class 'int'>
Name: Regular Price, dtype: object
But it slow performance, so the best dont use it. There also should be another problems - soe function failed, so the best is float
s if working with NaN
s:
Testing some function like diff
in 50k rows DataFrame
:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
df['Regular Price1'] = np.where(df['Regular Price'].isnull(),
None,
df['Regular Price'].fillna(0).astype(int))
In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()
TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'
In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop
In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop
In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop
EDIT:
df = pd.DataFrame({
'Regular Price': ['a','1','2.3','a','7'],
'B': list(range(5))
})
print (df)
B Regular Price
0 0 a
1 1 1
2 2 2.3
3 3 a
4 4 7
df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
B Regular Price
0 0 NaN
1 1 1.0
2 2 2.0
3 3 NaN
4 4 7.0
First is possible remove NaN
s rows by column Regular Price
and then convert to int
.
df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price'] = df1['Regular Price'].astype(int)
print (df1)
B Regular Price
1 1 1
2 2 2
4 4 7
Process what you need, but dont change index.
#e.g. some process
df1['Regular Price'] = df1['Regular Price'] * 100
Last combine_first
- it add NaN
to Regular Price
column.
df2 = df1.combine_first(df)
print (df2)
B Regular Price
0 0.0 NaN
1 1.0 100.0
2 2.0 200.0
3 3.0 NaN
4 4.0 700.0
Upvotes: 1