Reputation: 25812
the code below generates a df:
import pandas as pd
from datetime import datetime as dt
import numpy as np
dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
[strings1, dates, strings2, strings3, vals])))
+---+------+---------------------+------+------+-----+
| | A | B | C | D | E |
+---+------+---------------------+------+------+-----+
| 0 | A | 2014-01-02 02:00:00 | None | A | 1 |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2 |
| 2 | None | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT | C | None | 4 |
+---+------+---------------------+------+------+-----+
I would like to replace all None
(real None
in python, not str) inside with ''
(empty string).
The expected df is
+---+---+---------------------+---+---+-----+
| | A | B | C | D | E |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 | | A | 1 |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2 |
| 2 | | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT | C | | 4 |
+---+---+---------------------+---+---+-----+
what I did is
df = df.replace([None], [''], regex=True)
But I got
+---+---+---------------------+---+------+---+
| | A | B | C | D | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 | | A | 1 |
| 1 | B | 1388631600000000000 | B | B | 2 |
| 2 | | 1388635200000000000 | C | C | |
| 3 | C | | C | | 4 |
+---+---+---------------------+---+------+---+
NaT
and NaN
are replaced, which I don't want.How can I achieve that correctly and efficently?
Upvotes: 29
Views: 90131
Reputation: 584
You can use replace method. It works properly.
df.replace('', np.nan, inplace=True)
Upvotes: 3
Reputation: 583
This is sufficient
df.fillna("",inplace=True)
df
Out[142]:
A B C D E
0 A 2014-01-02 02:00:00 A 1
1 B 2014-01-02 03:00:00 B B 2
2 2014-01-02 04:00:00 C C
3 C C 4
edit 2021-07-26 complete response following @dWitty's comment
If you really want to keep Nat and NaN values on other than text, you just need fill Na for your text column In your exemple this is A, C, D
You just send a dict of replacement value for your columns. value can be differents for each column. For your case you just need construct the dict
# default values to replace NA (None)
# values = {"A": "", "C": "", "D": ""}
values = (dict([[e,""] for e in ['A','C','D']]))
df.fillna(value=values, inplace=True)
df
Out[142]:
A B C D E
0 A 2014-01-02 02:00:00 A 1.0
1 B 2014-01-02 03:00:00 B B 2.0
2 2014-01-02 04:00:00 C C NaN
3 C NaT C 4.0
Upvotes: 39
Reputation: 27896
For those who are trying to replace None
, and not just np.nan
(which is covered in here)
default_value = ""
df.apply(lambda x: x if x is not None else default_value)
here is a nice one-liner
Upvotes: 2
Reputation: 4647
Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.
So:
for c in df:
if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
df[c].fillna(value='', inplace=True)
This will leave numeric and date columns unaffected.
To see the data types for all columns:
df.dtypes
Upvotes: 6
Reputation: 393863
It looks like None
is being promoted to NaN
and so you cannot use replace
like usual, the following works:
In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
df.loc[mask[col], col] = ''
df
Out[126]:
A B C D E
0 A 2014-01-02 02:00:00 A 1
1 B 2014-01-02 03:00:00 B B 2
2 2014-01-02 04:00:00 C C NaN
3 C NaT C 4
So we generate a mask of the None
values using applymap
, we then use this mask to iterate over each column of interest and using the boolean mask set the values.
Upvotes: 17