Reputation: 4398
I have a dataset where I would like to de aggregate the values into their own unique rows as well as perform a pivot.
Data
ID type Q1 24 Q2 24
AA hello 1 1
AA hi 2 1
AA bye 1 0
AA ok 0 1
BB hello 1 0
BB hi 1 0
BB bye 0 1
BB ok 1 1
Desired
ID date type
AA Q1 24 hello
AA Q1 24 hi
AA Q1 24 hi
AA Q1 24 bye
AA Q2 24 hello
AA Q2 24 hi
AA Q2 24 ok
BB Q1 24 hello
BB Q1 24 hi
BB Q1 24 ok
BB Q1 24 bye
BB Q2 24 ok
Doing
import pandas as pd
import janitor
df.pivot_longer(
index = 'ID',
names_to = ('date', '.value'),
names_pattern = r"(Q\d+)_?(.+)",
sort_by_appearance = True)
I am currently researching, any suggestion is appreciated.
Upvotes: 2
Views: 51
Reputation: 11650
here is one way to do it
#melt the dataframe on ID and type
df2=df.melt(id_vars=['ID','type'], var_name='date')
# the value column has the repeated values, which is used to duplicate the
# indexes of the DF2, then merged to repeat the rows as many times as the
# value is repeated. finally, its sorting, reindexing and dropping unneeded
#columns
(df2.index.repeat(df2['value']).to_frame()
.merge(df2, left_index=True, right_index=True)
.sort_values(['ID','date'])
) .reset_index().drop(columns=['index','value',0])
for instance, for ID:AA, Q1-24, the value is 2, the df2.index.repeat this index twice, then merge with the df2.
ID type date
0 AA hello Q1 24
1 AA hi Q1 24
2 AA hi Q1 24
3 AA bye Q1 24
4 AA hello Q2 24
5 AA hi Q2 24
6 AA ok Q2 24
7 BB hello Q1 24
8 BB hi Q1 24
9 BB ok Q1 24
10 BB bye Q2 24
11 BB ok Q2 24
Upvotes: 1
Reputation: 29742
IIUC, one way using pandas.Index.repeat
with pandas.concat
:
df = df.set_index(["ID", "type"])
new_df = pd.concat([pd.Series(c, index=df.index.repeat(df[c]))
for c in df]).reset_index(name="date")
# Optionally, sort values
new_df = new_df.sort_values(["ID", "date"], ignore_index=True)
Output:
ID type date
0 AA hello Q1 24
1 AA hi Q1 24
2 AA hi Q1 24
3 AA bye Q1 24
4 AA hello Q2 24
5 AA hi Q2 24
6 AA ok Q2 24
7 BB hello Q1 24
8 BB hi Q1 24
9 BB ok Q1 24
10 BB bye Q2 24
11 BB ok Q2 24
Upvotes: 2