Groupby count based on year and specific condition

Question

I have a dataframe as shown below

Tenancy_ID        Unit_ID        Tenancy_End_Date
1                 A              2012-09-06 11:34:15
2                 B              2013-09-08 10:35:18
3                 A              2014-09-06 11:34:15
4                 C              2014-09-06 11:34:15
5                 B              2015-09-06 11:34:15
6                 A              2014-09-06 11:34:15
5                 A              2015-09-06 11:34:15
7                 A              2019-09-06 11:34:15
4                 C              2014-01-06 11:34:15
5                 C              2014-05-06 11:34:15

From the above I would like to generate below dataframe

Expected Output:

Unit_ID    NoC_2012   NoC_2013  NoC_2014   NoC_2015   NoC_2016   NoC_2017  NoC_2018   NoC_2019
A          1          0         2          1          0          0         0          1
B          0          1         0          1          0          0         0          0
C          0          0         3          0          0          0         0          0

Where NoC = Number of Contracts.

For example NoC_2012 = Number of Contracts in year 2012 based on Tenancy_End_Date

Note: Tenancy_ID is not unique. Tenancy_ID with Unit_ID is unique. ie there are same Tenancy_ID with different Unit_ID.

ansev · Accepted Answer

First convert to datetime your columns:

df['Tenancy_End_Date']= pd.to_datetime(df['Tenancy_End_Date'])

then use DataFrame.pivot_table with Series.dt.year to get the Years:

new_df = (df.assign(Year  = df['Tenancy_End_Date'].dt.year)
           #if you don't want convert to datetime use this instead
           #.assign(Year  = pd.to_datetime(df['Tenancy_End_Date']).dt.year)
            .pivot_table(index = 'Unit_ID',
                         columns = 'Year',
                         values = 'Tenancy_ID'
                         ,aggfunc = 'count',
                         fill_value =0)
            .add_prefix('NoC_')
            .reset_index()
            .rename_axis(columns = None))
print(new_df)
  Unit_ID  NoC_2012  NoC_2013  NoC_2014  NoC_2015  NoC_2019
0       A         1         0         2         1         1
1       B         0         1         0         1         0
2       C         0         0         3         0         0

Groupby count based on year and specific condition

Answers (1)

Related Questions