ubuntu_noob
ubuntu_noob

Reputation: 2365

Inserting flag on occurence of date

I have a pandas dataframe data-

Round Number    Date
1           7/4/2018 20:00
1           8/4/2018 16:00
1           8/4/2018 20:00
1           9/4/2018 20:00

Now I want to create a new dataframe which has two columns

['Date' ,'flag']

The Date column will have the dates of the range of dates in the data dataframe(in the actual data the dates are in the range of 7/4/2018 8:00:00 PM to 27/05/2018 19:00 so the date column in the new dataframe will have dates from 1/4/2018 to 30/05/2018 since 7/4/2018 8:00:00 PM is in the month of April so we will include the whole month of April and similarly since 27/05/2018 is in May so we include dates from 1/05/2018 t0 30/05/2018.

In the flag column we put 1 if that particular date was there in the old dataframe.

Output(partial)-

Date  Flag

1/4/2018 0
2/4/2018 0
3/4/2018 0
4/4/2018 0
5/4/2018 0
6/4/2018 0
7/4/2018 1
8/4/2018 1

and so on...

Upvotes: 1

Views: 360

Answers (1)

Celius Stingher
Celius Stingher

Reputation: 18377

I would use np.where() to address this issue. Furthermore, I'm working to improve the answer by setting the dateranges from old_df to be input of new_df

    import pandas as pd
    import numpy as np 
    old_df = pd.DataFrame({'date':['4/7/2018 20:00','4/8/2018 20:00'],'value':[1,2]})
    old_df['date'] = pd.to_datetime(old_df['date'],infer_datetime_format=True)
    new_df = pd.DataFrame({'date':pd.date_range(start='4/1/2018',end='5/30/2019',freq='d')})
    new_df['flag'] = np.where(new_df['date'].dt.date.astype(str).isin(old_df['date'].dt.date.astype(str).tolist()),1,0)
    print(new_df.head(10))
Output:

            date  flag
    0 2018-04-01     0
    1 2018-04-02     0
    2 2018-04-03     0
    3 2018-04-04     0
    4 2018-04-05     0
    5 2018-04-06     0
    6 2018-04-07     1
    7 2018-04-08     1
    8 2018-04-09     0
    9 2018-04-10     0

Edit:

Improved version, full code:

import pandas as pd
import numpy as np 
old_df = pd.DataFrame({'date':['4/7/2018 20:00','4/8/2018 20:00','5/30/2018 20:00'],'value':[1,2,3]})
old_df['date'] = pd.to_datetime(old_df['date'],infer_datetime_format=True)
   
if old_df['date'].min().month < 10:
    start_date = pd.to_datetime(
                            ("01/0"+str(old_df['date'].min().month)+"/"+str(old_df['date'].min().year)))
else:
    start_date = pd.to_datetime(
                            ("01/"+str(old_df['date'].min().month)+"/"+str(old_df['date'].min().year)))
end_date = old_df['date'].max()

end_date = pd.to_datetime(old_df['date'].max())
new_df = pd.DataFrame({'date':pd.date_range(start=start_date,end=end_date,freq='d')})
              
new_df['flag'] = np.where(new_df['date'].dt.date.astype(str).isin(old_df['date'].dt.date.astype(str).tolist()),1,0)

Upvotes: 1

Related Questions