Reputation: 47
I have a too big dataframe with x, y, year, month, how I can create a new column with the mean of days by month as INPUT
X/Y/YEAR/MONTH
1/2/1990/2
1/2/1990/5
1/2/1990/6
4/8/2020/1
4/8/2020/2
4/8/2020/3
4/8/2020/4
3/7/2020/8
I tried using conditional if per month but so doing too big the code, is there other way to do that? EXPECTED OUTPUT
X/Y/YEAR/MONTH/DAYS
1/2/1990/2/14
1/2/1990/5/15.5
1/2/1990/6/
4/8/2020/1/15.5
4/8/2020/2/14
4/8/2020/3/15.5
4/8/2020/4/15
3/7/2020/8/15.5
Upvotes: 1
Views: 39
Reputation: 16683
You can create a date series d
and use MonthEnd
with dt.day
and divide by 2
:
from pandas.tseries.offsets import MonthEnd
d = pd.to_datetime(df['YEAR'].astype(str) + '-' + df['MONTH'].astype(str) + '-01')
df['Days'] = (d + MonthEnd(1)).dt.day / 2
df
Out[1]:
X Y YEAR MONTH Days
0 1 2 1990 2 14.0
1 1 2 1990 5 15.5
2 1 2 1990 6 15.0
3 4 8 2020 1 15.5
4 4 8 2020 2 14.5
5 4 8 2020 3 15.5
6 4 8 2020 4 15.0
7 3 7 2020 8 15.5
Upvotes: 1