Reputation: 15466
I am processing inbound user data. I receive DataFrame
h
that is supposed to contain all float
but has some strings:
>>> h = pd.DataFrame(np.random.rand(3, 2), columns=['a', 'b'])
>>> h.loc[0, 'a'] = 'bad'
>>> h.loc[1, 'b'] = 'robot'
>>> h
a b
0 bad 0.747314
1 0.921919 robot
2 0.754256 0.664455
I process and set the strings to np.nan
(I realize np.nan
is a float
but this is to illustrate):
>>> hh = h.copy()
>>> hh.loc[0, 'a'] = np.nan
>>> hh.loc[1, 'b'] = np.nan
>>> hh
a b
0 NaN 0.747314
1 0.921919 NaN
2 0.754256 0.664455
I have a DataFrame
with expected values (or a dict
):
>>> g = pd.DataFrame({'a': ['foo'], 'b': ['bar']}, index=h.index)
>>> g
a b
0 foo bar
1 foo bar
2 foo bar
Which I use to fill where the bad data is.
>>> hh.fillna(g)
a b
0 foo 0.747314
1 0.921919 bar
2 0.754256 0.664455
I need to include the expected data too. So the result should be:
>>> magic(hh, g)
a b
0 rec=bad; exp=foo 0.747314
1 0.921919 rec=robot; exp=bar
2 0.754256 0.664455
How can I create such a result?
Upvotes: 1
Views: 71
Reputation: 862396
You can convert non necessary values to NaN
s by DataFrame.where
, join together with string
s and last replace original values:
m = hh.isna()
df = ('rec=' + h.where(m) + '; exp=' + g.where(m)).fillna(h)
print (df)
a b
0 rec=bad; exp=foo 0.440508
1 0.525949 rec=robot; exp=bar
2 0.337586 0.414336
Upvotes: 5