Reputation: 133
I have a dataset in this form
Agent ID Month values
101 Jan-17 2
101 Feb-17 4
101 Mar-17 3
101 Apr-17 8
101 May-17 12
101 Jun-17 3
101 Dec-17 1
102 Jan-17 2
102 Feb-17 3
102 Mar-17 7
102 Apr-17 3
102 May-17 2
102 Jun-17 11
102 Sep-17 2
102 Oct-17 2
102 Nov-17 1
102 Dec-17 4
I want it to come to this shape
Agent ID Month values Jan-17 Feb-17 Mar-17 Apr-17 May-17 Jun-17 Sep-17 Oct-17 Nov-17 Dec-17
101 Jan-17 2 2 4 3 8 12 3 0 0 0 1
101 Feb-17 4 2 4 3 8 12 3 0 0 0 1
101 Mar-17 3 2 4 3 8 12 3 0 0 0 1
101 Apr-17 8 2 4 3 8 12 3 0 0 0 1
101 May-17 12 2 4 3 8 12 3 0 0 0 1
101 Jun-17 3 2 4 3 8 12 3 0 0 0 1
101 Dec-17 1 2 4 3 8 12 3 0 0 0 1
102 Jan-17 2 2 3 7 3 2 11 2 2 1 4
102 Feb-17 3 2 3 7 3 2 11 2 2 1 4
102 Mar-17 7 2 3 7 3 2 11 2 2 1 4
102 Apr-17 3 2 3 7 3 2 11 2 2 1 4
102 May-17 2 2 3 7 3 2 11 2 2 1 4
102 Jun-17 11 2 3 7 3 2 11 2 2 1 4
102 Sep-17 2 2 3 7 3 2 11 2 2 1 4
102 Oct-17 2 2 3 7 3 2 11 2 2 1 4
102 Nov-17 1 2 3 7 3 2 11 2 2 1 4
102 Dec-17 4 2 3 7 3 2 11 2 2 1 4
Upvotes: 2
Views: 67
Reputation: 25239
It is also doable with pd.crosstab
and using apply
to ffill
and bfill
on groupby
.
I used the line from WenYoBen to convert df.Month to datime format to keep order properly as OP wants:
df.Month=pd.to_datetime(df.Month,format='%b-%y').dt.strftime('%Y-%m')
df1 = pd.crosstab(index=[df.AgentID, df.Month, df['values']], columns=df.Month, values=df['values'], aggfunc='first')
df1 = df1.groupby(level=0).apply(lambda x: x.ffill().bfill()).fillna(0).reset_index()
Out[2103]:
Month AgentID Month values 2017-01 2017-02 2017-03 2017-04 2017-05 \
0 101 2017-01 2 2.0 4.0 3.0 8.0 12.0
1 101 2017-02 4 2.0 4.0 3.0 8.0 12.0
2 101 2017-03 3 2.0 4.0 3.0 8.0 12.0
3 101 2017-04 8 2.0 4.0 3.0 8.0 12.0
4 101 2017-05 12 2.0 4.0 3.0 8.0 12.0
5 101 2017-06 3 2.0 4.0 3.0 8.0 12.0
6 101 2017-12 1 2.0 4.0 3.0 8.0 12.0
7 102 2017-01 2 2.0 3.0 7.0 3.0 2.0
8 102 2017-02 3 2.0 3.0 7.0 3.0 2.0
9 102 2017-03 7 2.0 3.0 7.0 3.0 2.0
10 102 2017-04 3 2.0 3.0 7.0 3.0 2.0
11 102 2017-05 2 2.0 3.0 7.0 3.0 2.0
12 102 2017-06 11 2.0 3.0 7.0 3.0 2.0
13 102 2017-09 2 2.0 3.0 7.0 3.0 2.0
14 102 2017-10 2 2.0 3.0 7.0 3.0 2.0
15 102 2017-11 1 2.0 3.0 7.0 3.0 2.0
16 102 2017-12 4 2.0 3.0 7.0 3.0 2.0
Month 2017-06 2017-09 2017-10 2017-11 2017-12
0 3.0 0.0 0.0 0.0 1.0
1 3.0 0.0 0.0 0.0 1.0
2 3.0 0.0 0.0 0.0 1.0
3 3.0 0.0 0.0 0.0 1.0
4 3.0 0.0 0.0 0.0 1.0
5 3.0 0.0 0.0 0.0 1.0
6 3.0 0.0 0.0 0.0 1.0
7 11.0 2.0 2.0 1.0 4.0
8 11.0 2.0 2.0 1.0 4.0
9 11.0 2.0 2.0 1.0 4.0
10 11.0 2.0 2.0 1.0 4.0
11 11.0 2.0 2.0 1.0 4.0
12 11.0 2.0 2.0 1.0 4.0
13 11.0 2.0 2.0 1.0 4.0
14 11.0 2.0 2.0 1.0 4.0
15 11.0 2.0 2.0 1.0 4.0
16 11.0 2.0 2.0 1.0 4.0
Upvotes: 0
Reputation: 323236
I think that is pivot
first then merge
df.Month=pd.to_datetime(df.Month,format='%b-%y').dt.strftime('%Y-%m')
s=df.pivot(*df.columns).fillna(0).reset_index()
df=df.merge(s)
df
Out[876]:
AgentID Month values ... 2017-10 2017-11 2017-12
0 101 2017-01 2 ... 0.0 0.0 1.0
1 101 2017-02 4 ... 0.0 0.0 1.0
2 101 2017-03 3 ... 0.0 0.0 1.0
3 101 2017-04 8 ... 0.0 0.0 1.0
4 101 2017-05 12 ... 0.0 0.0 1.0
5 101 2017-06 3 ... 0.0 0.0 1.0
6 101 2017-12 1 ... 0.0 0.0 1.0
7 102 2017-01 2 ... 2.0 1.0 4.0
8 102 2017-02 3 ... 2.0 1.0 4.0
9 102 2017-03 7 ... 2.0 1.0 4.0
10 102 2017-04 3 ... 2.0 1.0 4.0
11 102 2017-05 2 ... 2.0 1.0 4.0
12 102 2017-06 11 ... 2.0 1.0 4.0
13 102 2017-09 2 ... 2.0 1.0 4.0
14 102 2017-10 2 ... 2.0 1.0 4.0
15 102 2017-11 1 ... 2.0 1.0 4.0
16 102 2017-12 4 ... 2.0 1.0 4.0
[17 rows x 13 columns]
More Info
s
Out[878]:
Month AgentID 2017-01 2017-02 ... 2017-10 2017-11 2017-12
0 101 2.0 4.0 ... 0.0 0.0 1.0
1 102 2.0 3.0 ... 2.0 1.0 4.0
[2 rows x 11 columns]
Upvotes: 5