Reputation: 117
I have a DataFrame with a column named 'ID' that has duplicate observations. Each 'ID' row has one or more 'Article' values columns. I want to transpose the whole dataframe grouping by 'ID' adding new columns at the same row of a unique 'ID'. Order is important
My DataFrame:
ID Article Article_2
1 Banana NaN
2 Apple NaN
1 Apple Coconut
3 Tomatoe Coconut
1 Pineapple Tropical
2 Banana Coconut
4 Apple Coconut
5 Apple Coconut
3 Apple Pineapple
My code (by @Erfan):
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
Output:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple NaN Coconut Tropical
0002 Apple Banana NaN Coconut NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
At first row, Article_4 is NaN and Article_5 and 6 have values. It should be Article_4 Coconut, Article_5 Tropical and Article_6 NaN. At the second same, Article_3 is NaN and Article_4 is a valid value. It should be Article_3 valid and rest (4,5,6) NaNs
Needed output:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple Coconut Tropical NaN
0002 Apple Banana Coconut NaN NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
Upvotes: 1
Views: 68
Reputation: 863301
Add DataFrame.dropna
after melt
for remove missing rows by value
column:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2']).dropna(subset=['value'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5
1 Banana Apple Pineapple Coconut Tropical
2 Apple Banana Coconut NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN
4 Apple Coconut NaN NaN NaN
5 Apple Coconut NaN NaN NaN
If need all columns use a bit modified justify
function:
dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = pd.notna(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val, dtype=object)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
dfn = pd.DataFrame(justify(dfn.values, invalid_val=np.nan, axis=1, side='left'),
index=dfn.index, columns=dfn.columns)
print (dfn)
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
1 Banana Apple Pineapple Coconut Tropical NaN
2 Apple Banana Coconut NaN NaN NaN
3 Tomatoe Apple Coconut Pineapple NaN NaN
4 Apple Coconut NaN NaN NaN NaN
5 Apple Coconut NaN NaN NaN NaN
Upvotes: 1