Reputation: 422
This is probably a pretty simple question but I am very new to python so I'm not sure how to go about doing this. Let's say I have a pandas df of unique identifiers 101, 201, 301 and a monthly date range. I would like to combine these so that each identifier shows up once at each date (so if the monthly data were 12 months, the resulting data frame would be 36 rows).
id = {'unique':[101,201,301]}
id_df = pd.DataFrame(id)
month = pd.date_range(start = '2020-01-01', end = '2021-1-01', freq='M')
I know it's a pretty easy question but any help is greatly appreciated!
Upvotes: 2
Views: 652
Reputation: 765
id_df.merge(month.to_series(name="dates"),how='cross')
output:
unique dates
0 101 2020-01-31
1 101 2020-02-29
2 101 2020-03-31
3 101 2020-04-30
4 101 2020-05-31
5 101 2020-06-30
6 101 2020-07-31
7 101 2020-08-31
8 101 2020-09-30
9 101 2020-10-31
10 101 2020-11-30
11 101 2020-12-31
12 201 2020-01-31
13 201 2020-02-29
…
Upvotes: 0
Reputation: 261850
You could use itertools.product:
import itertools
pd.DataFrame(itertools.product([101,201,301],
pd.date_range(start = '2020-01-01', end = '2021-1-01', freq='M')),
columns=['unique', 'dates'],
)
output:
unique dates
0 101 2020-01-31
1 101 2020-02-29
2 101 2020-03-31
3 101 2020-04-30
4 101 2020-05-31
5 101 2020-06-30
6 101 2020-07-31
7 101 2020-08-31
8 101 2020-09-30
9 101 2020-10-31
10 101 2020-11-30
11 101 2020-12-31
12 201 2020-01-31
13 201 2020-02-29
…
Upvotes: 2
Reputation: 18315
You can first assign a new column dates
with each row being month
and then explode
it:
id_df["dates"] = [month] * len(id_df)
id_df = id_df.explode("dates", ignore_index=True)
where we ignore_index
so that resultant index is 0..N-1
.
after first operation:
unique dates
0 101 DatetimeIndex(["2020-01-31", "2020-02-29", "20...
1 201 DatetimeIndex(["2020-01-31", "2020-02-29", "20...
2 301 DatetimeIndex(["2020-01-31", "2020-02-29", "20...
and then
>>> id_df
unique dates
0 101 2020-01-31
1 101 2020-02-29
2 101 2020-03-31
3 101 2020-04-30
4 101 2020-05-31
5 101 2020-06-30
6 101 2020-07-31
7 101 2020-08-31
8 101 2020-09-30
9 101 2020-10-31
10 101 2020-11-30
11 101 2020-12-31
12 201 2020-01-31
13 201 2020-02-29
14 201 2020-03-31
15 201 2020-04-30
16 201 2020-05-31
17 201 2020-06-30
18 201 2020-07-31
19 201 2020-08-31
20 201 2020-09-30
21 201 2020-10-31
22 201 2020-11-30
23 201 2020-12-31
24 301 2020-01-31
25 301 2020-02-29
26 301 2020-03-31
27 301 2020-04-30
28 301 2020-05-31
29 301 2020-06-30
30 301 2020-07-31
31 301 2020-08-31
32 301 2020-09-30
33 301 2020-10-31
34 301 2020-11-30
35 301 2020-12-31
(you can do id_df = id_df.sort_values("dates", ignore_index=True, kind="mergesort")
afterwards to sort with respect to dates in a stable manner.)
Upvotes: 3