Reputation: 93794
My table:
In [15]: csv=u"""a,a,,a
....: b,b,,b
....: c,c,,c
....: """
In [18]: df = pd.read_csv(io.StringIO(csv), header=None)
Fill the empty columns as 'UNKNOWN'
In [19]: df
Out[19]:
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
In [20]: df.fillna({2:'UNKNOWN'})
Got the error
ValueError: could not convert string to float: UNKNOWN
Upvotes: 10
Views: 28844
Reputation: 91
df = pd.DataFrame({0:['a','b','c'], 1:['a','b','c'], 2:np.nan, 3:['a','b','c']})
df
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
you could do this by specifying the name of the column inside square brackets and using fillna
:
df[2].fillna('UNKNOWN', inplace=True)
If you print df
, it will be like this:
0 1 2 3
0 a a UNKNOWN a
1 b b UNKNOWN b
2 c c UNKNOWN c
you could fill all empty cells in all the columns by:
df.fillna('UNKNOWN', inplace=True)
Upvotes: 0
Reputation: 353059
Your 2
column probably has a float dtype:
>>> df
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
>>> df.dtypes
0 object
1 object
2 float64
3 object
dtype: object
Hence the problem. If you don't mind converting the whole frame to object
, you could:
>>> df.astype(object).fillna("UNKNOWN")
0 1 2 3
0 a a UNKNOWN a
1 b b UNKNOWN b
2 c c UNKNOWN c
Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.
Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna
:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.dtypes
0 int64
1 object
2 object
3 float64
4 object
5 float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
... df[col] = df[col].astype(object).fillna("UNKNOWN")
...
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
>>> df.dtypes
0 int64
1 object
2 object
3 object
4 object
5 object
dtype: object
Or (if you're using all
), then maybe not even use fillna
at all:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
Upvotes: 11
Reputation: 128958
As a workaround, just set the column directly, the fillna upconversion should work and is a bug
In [8]: df = pd.read_csv(io.StringIO(csv), header=None)
In [9]: df
Out[9]:
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
In [10]: df.loc[:,2] = 'foo'
In [11]: df
Out[11]:
0 1 2 3
0 a a foo a
1 b b foo b
2 c c foo c
In [12]: df.dtypes
Out[12]:
0 object
1 object
2 object
3 object
dtype: object
Upvotes: 4