Reputation: 2871
I have a dataframe as shown below
Tenancy_ID Unit_ID Tenancy_End_Date
1 A 2012-09-06 11:34:15
2 B 2013-09-08 10:35:18
3 A 2014-09-06 11:34:15
4 C 2014-09-06 11:34:15
5 B 2015-09-06 11:34:15
6 A 2014-09-06 11:34:15
5 A 2015-09-06 11:34:15
7 A 2019-09-06 11:34:15
4 C 2014-01-06 11:34:15
5 C 2014-05-06 11:34:15
From the above I would like to generate below dataframe
Expected Output:
Unit_ID NoC_2012 NoC_2013 NoC_2014 NoC_2015 NoC_2016 NoC_2017 NoC_2018 NoC_2019
A 1 0 2 1 0 0 0 1
B 0 1 0 1 0 0 0 0
C 0 0 3 0 0 0 0 0
Where NoC = Number of Contracts.
For example NoC_2012 = Number of Contracts in year 2012 based on Tenancy_End_Date
Note: Tenancy_ID is not unique. Tenancy_ID with Unit_ID is unique. ie there are same Tenancy_ID with different Unit_ID.
Upvotes: 0
Views: 42
Reputation: 30920
First convert to datetime your columns:
df['Tenancy_End_Date']= pd.to_datetime(df['Tenancy_End_Date'])
then use DataFrame.pivot_table
with Series.dt.year
to get the Years:
new_df = (df.assign(Year = df['Tenancy_End_Date'].dt.year)
#if you don't want convert to datetime use this instead
#.assign(Year = pd.to_datetime(df['Tenancy_End_Date']).dt.year)
.pivot_table(index = 'Unit_ID',
columns = 'Year',
values = 'Tenancy_ID'
,aggfunc = 'count',
fill_value =0)
.add_prefix('NoC_')
.reset_index()
.rename_axis(columns = None))
print(new_df)
Unit_ID NoC_2012 NoC_2013 NoC_2014 NoC_2015 NoC_2019
0 A 1 0 2 1 1
1 B 0 1 0 1 0
2 C 0 0 3 0 0
Upvotes: 2