ojp
ojp

Reputation: 1033

Compute new pandas column for the number of time a date intersects a list of date ranges

I have actually solved the problem, but I am looking for advice for a more elegant / pandas-orientated solution.

I have a pandas dataframe of linkedin followers with a date field. The data looks like this:

    Date    Sponsored followers Organic followers   Total followers
0   2021-05-30  0   105 105
1   2021-05-31  0   128 128
2   2021-06-01  0   157 157
3   2021-06-02  0   171 171
4   2021-06-03  0   133 133

I have a second dataframe that contains the start and end dates for paid social campaigns. What I have done is create a list of tuples from this dataframe, where the first element in the tuple is the start date, and the second is the end date, i converted these dates to datetimes as such:

[(datetime.date(2021, 7, 8), datetime.date(2021, 7, 9)),
 (datetime.date(2021, 7, 12), datetime.date(2021, 7, 13)),
 (datetime.date(2021, 7, 13), datetime.date(2021, 7, 14)),
 (datetime.date(2021, 7, 14), datetime.date(2021, 7, 15)),
 (datetime.date(2021, 7, 16), datetime.date(2021, 7, 18)),
 (datetime.date(2021, 7, 19), datetime.date(2021, 7, 21)),
 (datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
 (datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
 (datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
 (datetime.date(2021, 8, 9), datetime.date(2021, 8, 12)),
 (datetime.date(2021, 8, 12), datetime.date(2021, 8, 15)),
 (datetime.date(2021, 9, 3), datetime.date(2021, 9, 7)),
 (datetime.date(2021, 10, 22), datetime.date(2021, 11, 21)),
 (datetime.date(2021, 10, 29), datetime.date(2021, 11, 10)),
 (datetime.date(2021, 10, 29), datetime.date(2021, 11, 2)),
 (datetime.date(2021, 11, 3), datetime.date(2021, 11, 4)),
 (datetime.date(2021, 11, 5), datetime.date(2021, 11, 8)),
 (datetime.date(2021, 11, 9), datetime.date(2021, 11, 12)),
 (datetime.date(2021, 11, 12), datetime.date(2021, 11, 16)),
 (datetime.date(2021, 11, 11), datetime.date(2021, 11, 12)),
 (datetime.date(2021, 11, 25), datetime.date(2021, 11, 27)),
 (datetime.date(2021, 11, 26), datetime.date(2021, 11, 28)),
 (datetime.date(2021, 12, 8), datetime.date(2021, 12, 11))]

In order to create a new column in my main dataframe (which is a count of how many campaigns falls on any given day), I loop through each row in my dataframe, and then each element in my list using the following code:

is_campaign = []

for date in df['Date']:
    count = 0
    for date_range in campaign_dates:
        if date_range[0] <= date <= date_range[1]:
            count += 1
    is_campaign.append(count)

df['campaign'] = is_campaign

Which gives the following result:

df[df['campaign']!=0]

Date    Sponsored followers Organic followers   Total followers campaign
39  2021-07-08  0   160 160 1
40  2021-07-09  17  166 183 1
43  2021-07-12  0   124 124 1
44  2021-07-13  16  138 154 2
45  2021-07-14  22  158 180 2
... ... ... ... ... ...
182 2021-11-28  31  202 233 1
192 2021-12-08  28  357 385 1
193 2021-12-09  29  299 328 1
194 2021-12-10  23  253 276 1
195 2021-12-11  25  163 188 1

Any advice on how this could be done in a more efficient way, and specifically using pandas functionality would be appreciated.

Upvotes: 0

Views: 34

Answers (1)

Raymond Kwok
Raymond Kwok

Reputation: 2541

My idea would be to use your second DataFrame alone to count the number of campaigns by date, and finally put the numbers into your first DataFrame. In this way you only go through your list of date-ranges once (or twice if you also take the counting step into account).

Expand your list of date-ranges into list of dates. Note that dates that occur N times represents N campaigns on that date.

dates = [
    start_date + datetime.timedelta(day) 
        for start_date, end_date in date_ranges 
            for day in range((end_date - start_date).days + 1)
]

Then do the counting.

from collections import Counter
date_counts = Counter(dates)

Finally, put the numbers in.

df1['campaign'] = df1['Date'].map(pd.Series(date_counts))

Upvotes: 1

Related Questions