Starbucks
Starbucks

Reputation: 1568

Create Indicator Column for Date Ranges in Another Pandas Dataframe

I need to create an indicator column based on dfmap date ranges for all tags in df.

import pandas as pd

df = pd.DataFrame({
    'date': ['2019-04-19','2019-04-20','2019-04-21', '2019-04-22',
             '2019-10-01','2019-10-02','2019-10-03', '2019-10-04'],
    'tag': ['ID F', 'ID F', 'ID F', 'ID F',
             'ID B', 'ID B', 'ID B', 'ID B'],
    'value': ['1', '2', '3', '4',
             '1', '3', '5', '7']})

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

dfmap = pd.DataFrame({
    'start_date': ['2019-04-20','2019-10-03'],
    'end_date': ['2019-04-21','2019-10-04'],
    'tag': ['ID F', 'ID B']})

print(df)
print(dfmap)

        date   tag value
0 2019-04-19  ID F     1
1 2019-04-20  ID F     2
2 2019-04-21  ID F     3
3 2019-04-22  ID F     4
4 2019-10-01  ID B     1
5 2019-10-02  ID B     3
6 2019-10-03  ID B     5
7 2019-10-04  ID B     7
   start_date    end_date   tag
0  2019-04-20  2019-04-21  ID F
1  2019-10-03  2019-10-04  ID B

Desired Dataframe:

print(desired_df)

        date   tag value indicator
0 2019-04-19  ID F     1         0
1 2019-04-20  ID F     2         1
2 2019-04-21  ID F     3         1
3 2019-04-22  ID F     4         0 
4 2019-10-01  ID B     1         0 
5 2019-10-02  ID B     3         0
6 2019-10-03  ID B     5         1
7 2019-10-04  ID B     7         1

Upvotes: 0

Views: 505

Answers (3)

Jed Mitten
Jed Mitten

Reputation: 76

The full logic of how the date and tag identify an indicator is unclear to me. I determined it to be

  1. if the range exists
  2. check if the tag is the same
  3. if in-range and tag matches, indicator is 1
  4. else indicator is 0

I like to write functions when facing this sort of problem.

# first setup the dfmap to explicitly be Timestamp as you did with df
dfmap['start_date'] = pd.to_datetime(dfmap['start_date'])
dfmap['end_date'] = pd.to_datetime(dfmap['end_date'])

# write your logic for the range indicators
def get_indicator(row, df):
    dt = row.date
    tag = row.tag
    
    for idx, map_row in df.iterrows():
        if map_row.start_date <= dt <= map_row.end_date:
            if row.tag == map_row.tag:
                return 1
    return 0

# apply
df['indicator'] = df.apply(lambda x: get_indicator(x, dfmap), axis=1)

# print(df)
#         date   tag value  indicator
# 0 2019-04-19  ID F     1          0
# 1 2019-04-20  ID F     2          1
# 2 2019-04-21  ID F     3          1
# 3 2019-04-22  ID F     4          0
# 4 2019-10-01  ID B     1          0
# 5 2019-10-02  ID B     3          0
# 6 2019-10-03  ID B     5          1
# 7 2019-10-04  ID B     7          1

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24324

try via date_range()+agg() method and isin()+astype() method:

s=dfmap.agg(lambda x:pd.date_range(x['start_date'],x['end_date']).normalize(),axis=1).explode().unique()
df['indicator']=df['date'].isin(s).astype(int)

Note: you can also use apply() in place of agg() method

OR

via date_range()+zip()

s=[[*pd.date_range(x,y).normalize()] for x,y in zip(dfmap['start_date'],dfmap['end_date'])]
s=pd.Series(s).explode().unique()
df['indicator']=df['date'].isin(s).view('i1')

output of df:

    date        tag     value   indicator
0   2019-04-19  ID F    1       0
1   2019-04-20  ID F    2       1
2   2019-04-21  ID F    3       1
3   2019-04-22  ID F    4       0
4   2019-10-01  ID B    1       0
5   2019-10-02  ID B    3       0
6   2019-10-03  ID B    5       1
7   2019-10-04  ID B    7       1

Upvotes: 0

Pygirl
Pygirl

Reputation: 13339

Just by writing simple logic:

g = lambda x: pd.to_datetime(x)
g = lambda x: pd.to_datetime(x)
df['date'] = g(df['date'])
dfmap[['start_date', 'end_date']].apply(g)

conditions = [((df['tag'].eq(idx)) & (df['date'].between(start, end))) for idx, start, end in zip(dfmap['tag'], dfmap['start_date'], dfmap['end_date'])]
cond = conditions[0] | conditions[1]
df['indicator'] = np.where(cond,1,0)

df:

        date   tag value  indicator
0 2019-04-19  ID F     1          0
1 2019-04-20  ID F     2          1
2 2019-04-21  ID F     3          1
3 2019-04-22  ID F     4          0
4 2019-10-01  ID B     1          0
5 2019-10-02  ID B     3          0
6 2019-10-03  ID B     5          1
7 2019-10-04  ID B     7          1

Upvotes: 1

Related Questions