Reputation: 467
I have a dataframe (in Python) as follows:
A B C D E F G H
0 T-1 2 3 - M-7 2 2
1 T-2 5 4 2 M-7 4 3
2 T-3 - - 3 M-7 9 4
3 T-4 6 - 4 M-7 - 9 5
4 T-5 - 1 5 M-7 8 6
I would like to replace the dashes (excluding those in column A and E) with NaN
.
I tried df.replace('-', np.nan)
.It ended up with replacing the entire cells of columns A and E as well. I guess I can use df[column_name].replace('-',np.nan)
but then I would have to apply it for each column separately.
Is there any other way, which is efficient and will work for any number of columns with similar restrictions?
Upvotes: 7
Views: 12297
Reputation: 2785
This should work.
df = pd.DataFrame({'A': list('abcde'),
'B': ['T-1', 'T-2', 'T-3', 'T-4', 'T-5'],
'C': ['a', '-', 'c', 'd', '-'],
'D': ['-', 'b', 'c', 'd', 'e'],
'E': ['M-7', 'M-7', 'M-7', 'M-7', 'M-7'],
'F': ['a', '-', 'c', '-', '-'],
'G': ['a', 'b', 'c', 'd', '-'],
'H': ['a', 'b', '-', 'd', '-']
})
df = df.astype(str)
s = df.applymap(lambda x: re.sub(r'^-$', str(np.NaN), x))
Output:
A B C D E F G H
0 a T-1 a nan M-7 a a a
1 b T-2 nan b M-7 nan b b
2 c T-3 c c M-7 c c nan
3 d T-4 d d M-7 nan d d
4 e T-5 nan e M-7 nan nan nan
Upvotes: 1
Reputation: 210852
In [18]: df[df.columns.drop(['A','E'])] = \
df[df.columns.drop(['A','E'])].replace('-', np.nan)
In [19]: df
Out[19]:
A B C D E F G H
0 a T-1 a NaN M-7 a a a
1 b T-2 NaN b M-7 NaN b b
2 c T-3 c c M-7 c c NaN
3 d T-4 d d M-7 NaN d d
4 e T-5 NaN e M-7 NaN NaN NaN
Upvotes: 2