Reputation: 349
I have a dataframe df
that looks similar to this:
identity Start End week
E 6/18/2020 7/2/2020 1
E 6/18/2020 7/2/2020 2
2D 7/18/2020 8/1/2020 1
2D 7/18/2020 8/1/2020 2
A1 9/6/2020 9/20/2020 1
A1 9/6/2020 9/20/2020 2
The problem is that when I extracted the data I only had Start date and End date for every identity it replaced, but I have the data by weeks all identitys have the same amount of weeks some times all identitys can have 5 or 6 weeks but they are always the same. I want to make Stata and end be weekly so when the first week end I add 7 days. And when the week starts again it starts where week ended. A representation would be
identity Start End week
E 6/18/2020 6/25/2020 1
E 6/25/2020 7/2/2020 2
2D 7/18/2020 7/25/2020 1
2D 7/25/2020 8/1/2020 2
A1 9/6/2020 9/13/2020 1
A1 9/13/2020 9/20/2020 2
I tried a simple method that was creating a sevens column and making the sum to get the end of the week I get and error Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq
Then I would concat start over minus seven but I don't know how to get around this problem. Any help would be magnificent.
Upvotes: 1
Views: 87
Reputation: 28644
Similar to your other question:
First convert to datetimes:
df.loc[:, ["Start", "End"]] = (df.loc[:, ["Start", "End"]]
.transform(pd.to_datetime, format="%m/%d/%Y"))
df
identity Start End week
0 E 2020-06-18 2020-07-02 1
1 E 2020-06-18 2020-07-02 2
2 2D 2020-07-18 2020-08-01 1
3 2D 2020-07-18 2020-08-01 2
4 A1 2020-09-06 2020-09-20 1
5 A1 2020-09-06 2020-09-20 2
Your identity is in groups of two, so I'll use that when selecting dates from the date_range:
from itertools import chain
result = df.drop_duplicates(subset="identity")
date_range = (
pd.date_range(start, end, freq="7D")[:2]
for start, end in zip(result.Start, result.End)
)
date_range = chain.from_iterable(date_range)
End = lambda df: df.Start.add(pd.Timedelta("7 days"))
Create new dataframe:
df.assign(Start=list(date_range), End=End)
identity Start End week
0 E 2020-06-18 2020-06-25 1
1 E 2020-06-25 2020-07-02 2
2 2D 2020-07-18 2020-07-25 1
3 2D 2020-07-25 2020-08-01 2
4 A1 2020-09-06 2020-09-13 1
5 A1 2020-09-13 2020-09-20 2
Upvotes: 1