Reputation: 103
I occasionally am delivered a dataframe with many N/A values.
In these cases, there are reduntant rows. For every X value there is only one Y value. Therefore, I would like to merge the two "example1" rows into 1 row (as shown in the image), by combining the "context" column with measurement column names (M1,M2,..Mn).
How might one do this with pandas dataframe functions?
Thanks.
Upvotes: 0
Views: 47
Reputation: 563
You could use a join. It takes in rsuffix
and lsuffix
parameters, so it would be easier to use those, but if you needed to use a prefix you could change it manually.
Create your DataFrame
df = pd.DataFrame({'X':['example1', 'example1'], 'context':['a', 'b'], 'M1':[0.1, np.nan], 'M2':[np.nan,0.2], 'M3':[np.nan, 0.3], 'Y':[0.5, 0.5]}, columns=['X', 'context', 'M1', 'M2', 'M3', 'Y'])
Solution
dfa = df[df['context'] == 'a'].set_index(['X', 'Y']).drop('context', axis=1)
dfb = df[df['context'] == 'b'].set_index(['X', 'Y']).drop('context', axis=1)
dfa.join(dfb, how='left', lsuffix='_a', rsuffix='_b').dropna(axis=1)
Upvotes: 1
Reputation: 294488
df = pd.DataFrame(
[
['a', .1, np.nan, np.nan, .5],
['b', np.nan, .2, .3, .5],
],
['example1', 'example1'],
['context', 'M1', 'M2', 'M3', 'Y']
)
d1 = df.set_index('context', append=True).stack().unstack([1, 2])
d1.columns = d1.columns.map('_'.join)
d1
Upvotes: 1