Reputation: 69
I have a dataframe which look like:
0 target_year ID v1 v2
1 2000 1 0.3 1
2 2000 2 1.2 4
...
10 2001 1 3 2
11 2001 2 2 2
An I would like the following output:
0 ID v1_1 v2_1 v1_2 v2_2
1 1 0.3 1 3 2
2 2 1.2 4 2 2
Do you have any idea how to do that?
Upvotes: 5
Views: 485
Reputation: 88236
You could use pd.pivot_table
, using the GroupBy.cumcount
of ID
as columns.
Then we can use a list comprehension with f-strings
to merge the MultiIndex
header into a sinlge level:
cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
index = df.ID,
columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]
v1_1 v1_2 v2_1 v2_2
ID
1 0.3 3.0 1 2
2 1.2 2.0 4 2
Upvotes: 6
Reputation: 150735
If your data come in only two years, you can also merge
:
cols = ['ID','v1', 'v2']
df[df.target_year.eq(2000)][cols].merge(df[df.target_year.eq(2001)][cols],
on='ID',
suffixes=['_1','_2'])
Output
ID v1_1 v2_1 v1_2 v2_2
0 1 0.3 1 3.0 2
1 2 1.2 4 2.0 2
Upvotes: 0
Reputation: 862611
Use GroupBy.cumcount
for counter column, reshape by DataFrame.set_index
with DataFrame.unstack
and last flatten in list comprehension and f-string
s:
g = df.groupby('ID').ID.cumcount() + 1
df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID v1_1 v1_2 v2_1 v2_2
0 1 0.3 3.0 1 2
1 2 1.2 2.0 4 2
Upvotes: 2