Reputation: 1033
I have actually solved the problem, but I am looking for advice for a more elegant / pandas-orientated solution.
I have a pandas dataframe of linkedin followers with a date field. The data looks like this:
Date Sponsored followers Organic followers Total followers
0 2021-05-30 0 105 105
1 2021-05-31 0 128 128
2 2021-06-01 0 157 157
3 2021-06-02 0 171 171
4 2021-06-03 0 133 133
I have a second dataframe that contains the start and end dates for paid social campaigns. What I have done is create a list of tuples from this dataframe, where the first element in the tuple is the start date, and the second is the end date, i converted these dates to datetimes as such:
[(datetime.date(2021, 7, 8), datetime.date(2021, 7, 9)),
(datetime.date(2021, 7, 12), datetime.date(2021, 7, 13)),
(datetime.date(2021, 7, 13), datetime.date(2021, 7, 14)),
(datetime.date(2021, 7, 14), datetime.date(2021, 7, 15)),
(datetime.date(2021, 7, 16), datetime.date(2021, 7, 18)),
(datetime.date(2021, 7, 19), datetime.date(2021, 7, 21)),
(datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
(datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
(datetime.date(2021, 7, 30), datetime.date(2021, 8, 2)),
(datetime.date(2021, 8, 9), datetime.date(2021, 8, 12)),
(datetime.date(2021, 8, 12), datetime.date(2021, 8, 15)),
(datetime.date(2021, 9, 3), datetime.date(2021, 9, 7)),
(datetime.date(2021, 10, 22), datetime.date(2021, 11, 21)),
(datetime.date(2021, 10, 29), datetime.date(2021, 11, 10)),
(datetime.date(2021, 10, 29), datetime.date(2021, 11, 2)),
(datetime.date(2021, 11, 3), datetime.date(2021, 11, 4)),
(datetime.date(2021, 11, 5), datetime.date(2021, 11, 8)),
(datetime.date(2021, 11, 9), datetime.date(2021, 11, 12)),
(datetime.date(2021, 11, 12), datetime.date(2021, 11, 16)),
(datetime.date(2021, 11, 11), datetime.date(2021, 11, 12)),
(datetime.date(2021, 11, 25), datetime.date(2021, 11, 27)),
(datetime.date(2021, 11, 26), datetime.date(2021, 11, 28)),
(datetime.date(2021, 12, 8), datetime.date(2021, 12, 11))]
In order to create a new column in my main dataframe (which is a count of how many campaigns falls on any given day), I loop through each row in my dataframe, and then each element in my list using the following code:
is_campaign = []
for date in df['Date']:
count = 0
for date_range in campaign_dates:
if date_range[0] <= date <= date_range[1]:
count += 1
is_campaign.append(count)
df['campaign'] = is_campaign
Which gives the following result:
df[df['campaign']!=0]
Date Sponsored followers Organic followers Total followers campaign
39 2021-07-08 0 160 160 1
40 2021-07-09 17 166 183 1
43 2021-07-12 0 124 124 1
44 2021-07-13 16 138 154 2
45 2021-07-14 22 158 180 2
... ... ... ... ... ...
182 2021-11-28 31 202 233 1
192 2021-12-08 28 357 385 1
193 2021-12-09 29 299 328 1
194 2021-12-10 23 253 276 1
195 2021-12-11 25 163 188 1
Any advice on how this could be done in a more efficient way, and specifically using pandas functionality would be appreciated.
Upvotes: 0
Views: 34
Reputation: 2541
My idea would be to use your second DataFrame alone to count the number of campaigns by date, and finally put the numbers into your first DataFrame. In this way you only go through your list of date-ranges once (or twice if you also take the counting step into account).
Expand your list of date-ranges into list of dates. Note that dates that occur N times represents N campaigns on that date.
dates = [
start_date + datetime.timedelta(day)
for start_date, end_date in date_ranges
for day in range((end_date - start_date).days + 1)
]
Then do the counting.
from collections import Counter
date_counts = Counter(dates)
Finally, put the numbers in.
df1['campaign'] = df1['Date'].map(pd.Series(date_counts))
Upvotes: 1