Reputation: 11
I have a dataframe looking like this :
DATA | 0 | 1 | 2 | 3
XX1 | | 0A | 0B |
XX2 | | 0C | |
XX3 | | 1A | 0A | 1C
I would like to transform it that way :
id | DATA
0A | XX1
0A | XX3
0B | XX1
0C | XX2
1A | XX3
1C | XX3
Every id must appear each time it's appearing in the dataframe, that's the reason why there is 2 "0A"
I tried this so far, but I'm getting nowhere.
la = []
for b in range(2,10):
la.append(df.groupby(by=[b])[b,"DATA"].agg(np.sum).rename(columns={b:"id"}))
df = pd.concat(la, ignore_index=True)
Anyone got a suggestion?
Upvotes: 0
Views: 43
Reputation: 42946
You can use DataFrame.melt
for this:
(
df.melt(id_vars="DATA", value_name="id")
.dropna()
.sort_values("id", ignore_index=True)
.drop(columns=["variable"])
)
DATA id
0 XX1 0A
1 XX3 0A
2 XX1 0B
3 XX2 0C
4 XX3 1A
5 XX3 1C
Upvotes: 1