Reputation: 13
companies_id transaction_month count
0 2020-10-01 3
1 2020-10-01 5
1 2020-11-01 5
1 2020-12-01 18
1 2021-01-01 8
I want the result to be like
companies_id transaction_month count first_month
0 2020-10-01 3
1 2020-10-01 5 2020-10-01
1 2020-11-01 5 2020-10-01
1 2020-12-01 18 2020-10-01
1 2021-01-01 8 2020-10-01
This is my data set I want to add a new column called "first month" that should contain the value from transaction month column where the corresponding count is >=5.
for example : In case of companies_id 1:
first 5 or more transaction occurred on 2020-10-01 therefore "first month" column should contain 2020-10-01 throughout i.e to all rows with companies_id as 1.
Upvotes: 0
Views: 134
Reputation: 863521
Use Series.where
for replace transaction_month
to NaN
if not >=5
count and then use GroupBy.transform
with GroupBy.first
for first non missing values per groups to new column:
df['transaction_month'] = pd.to_datetime(df['transaction_month'])
print (df['transaction_month'].where(df['count'] >= 5))
0 NaT
1 2020-10-01
2 2020-11-01
3 2020-12-01
4 2021-01-01
Name: transaction_month, dtype: datetime64[ns]
df['first_month'] = (df['transaction_month'].where(df['count'] >= 5)
.groupby(df['companies_id'])
.transform('first'))
print (df)
companies_id transaction_month count first_month
0 0 2020-10-01 3 NaT
1 1 2020-10-01 5 2020-10-01
2 1 2020-11-01 5 2020-10-01
3 1 2020-12-01 18 2020-10-01
4 1 2021-01-01 8 2020-10-01
Upvotes: 1