Reputation: 2365
I have a pandas dataframe data-
Round Number Date
1 7/4/2018 20:00
1 8/4/2018 16:00
1 8/4/2018 20:00
1 9/4/2018 20:00
Now I want to create a new dataframe which has two columns
['Date' ,'flag']
The Date column will have the dates of the range of dates in the data dataframe(in the actual data the dates are in the range of 7/4/2018 8:00:00 PM
to 27/05/2018 19:00
so the date column in the new dataframe will have dates from 1/4/2018 to 30/05/2018 since 7/4/2018 8:00:00 PM
is in the month of April so we will include the whole month of April and similarly since 27/05/2018 is in May so we include dates from 1/05/2018 t0 30/05/2018.
In the flag column we put 1 if that particular date was there in the old dataframe.
Output(partial)-
Date Flag
1/4/2018 0
2/4/2018 0
3/4/2018 0
4/4/2018 0
5/4/2018 0
6/4/2018 0
7/4/2018 1
8/4/2018 1
and so on...
Upvotes: 1
Views: 360
Reputation: 18377
I would use np.where()
to address this issue. Furthermore, I'm working to improve the answer by setting the dateranges from old_df
to be input of new_df
import pandas as pd
import numpy as np
old_df = pd.DataFrame({'date':['4/7/2018 20:00','4/8/2018 20:00'],'value':[1,2]})
old_df['date'] = pd.to_datetime(old_df['date'],infer_datetime_format=True)
new_df = pd.DataFrame({'date':pd.date_range(start='4/1/2018',end='5/30/2019',freq='d')})
new_df['flag'] = np.where(new_df['date'].dt.date.astype(str).isin(old_df['date'].dt.date.astype(str).tolist()),1,0)
print(new_df.head(10))
Output:
date flag
0 2018-04-01 0
1 2018-04-02 0
2 2018-04-03 0
3 2018-04-04 0
4 2018-04-05 0
5 2018-04-06 0
6 2018-04-07 1
7 2018-04-08 1
8 2018-04-09 0
9 2018-04-10 0
Improved version, full code:
import pandas as pd
import numpy as np
old_df = pd.DataFrame({'date':['4/7/2018 20:00','4/8/2018 20:00','5/30/2018 20:00'],'value':[1,2,3]})
old_df['date'] = pd.to_datetime(old_df['date'],infer_datetime_format=True)
if old_df['date'].min().month < 10:
start_date = pd.to_datetime(
("01/0"+str(old_df['date'].min().month)+"/"+str(old_df['date'].min().year)))
else:
start_date = pd.to_datetime(
("01/"+str(old_df['date'].min().month)+"/"+str(old_df['date'].min().year)))
end_date = old_df['date'].max()
end_date = pd.to_datetime(old_df['date'].max())
new_df = pd.DataFrame({'date':pd.date_range(start=start_date,end=end_date,freq='d')})
new_df['flag'] = np.where(new_df['date'].dt.date.astype(str).isin(old_df['date'].dt.date.astype(str).tolist()),1,0)
Upvotes: 1