tfr950
tfr950

Reputation: 422

Repeat each identifier over date range

This is probably a pretty simple question but I am very new to python so I'm not sure how to go about doing this. Let's say I have a pandas df of unique identifiers 101, 201, 301 and a monthly date range. I would like to combine these so that each identifier shows up once at each date (so if the monthly data were 12 months, the resulting data frame would be 36 rows).

id = {'unique':[101,201,301]}
id_df = pd.DataFrame(id)
month = pd.date_range(start = '2020-01-01', end = '2021-1-01', freq='M')

I know it's a pretty easy question but any help is greatly appreciated!

Upvotes: 2

Views: 652

Answers (3)

G.G
G.G

Reputation: 765

id_df.merge(month.to_series(name="dates"),how='cross')

output:

    unique  dates
0   101 2020-01-31
1   101 2020-02-29
2   101 2020-03-31
3   101 2020-04-30
4   101 2020-05-31
5   101 2020-06-30
6   101 2020-07-31
7   101 2020-08-31
8   101 2020-09-30
9   101 2020-10-31
10  101 2020-11-30
11  101 2020-12-31
12  201 2020-01-31
13  201 2020-02-29
…

Upvotes: 0

mozway
mozway

Reputation: 261850

You could use itertools.product:

import itertools
pd.DataFrame(itertools.product([101,201,301],
                               pd.date_range(start = '2020-01-01', end = '2021-1-01', freq='M')),
             columns=['unique', 'dates'],
            )

output:

    unique  dates
0   101 2020-01-31
1   101 2020-02-29
2   101 2020-03-31
3   101 2020-04-30
4   101 2020-05-31
5   101 2020-06-30
6   101 2020-07-31
7   101 2020-08-31
8   101 2020-09-30
9   101 2020-10-31
10  101 2020-11-30
11  101 2020-12-31
12  201 2020-01-31
13  201 2020-02-29
…

Upvotes: 2

Mustafa Aydın
Mustafa Aydın

Reputation: 18315

You can first assign a new column dates with each row being month and then explode it:

id_df["dates"] = [month] * len(id_df)
id_df = id_df.explode("dates", ignore_index=True)

where we ignore_index so that resultant index is 0..N-1.

after first operation:

   unique                                              dates
0     101  DatetimeIndex(["2020-01-31", "2020-02-29", "20...
1     201  DatetimeIndex(["2020-01-31", "2020-02-29", "20...
2     301  DatetimeIndex(["2020-01-31", "2020-02-29", "20...

and then

>>> id_df

    unique      dates
0      101 2020-01-31
1      101 2020-02-29
2      101 2020-03-31
3      101 2020-04-30
4      101 2020-05-31
5      101 2020-06-30
6      101 2020-07-31
7      101 2020-08-31
8      101 2020-09-30
9      101 2020-10-31
10     101 2020-11-30
11     101 2020-12-31
12     201 2020-01-31
13     201 2020-02-29
14     201 2020-03-31
15     201 2020-04-30
16     201 2020-05-31
17     201 2020-06-30
18     201 2020-07-31
19     201 2020-08-31
20     201 2020-09-30
21     201 2020-10-31
22     201 2020-11-30
23     201 2020-12-31
24     301 2020-01-31
25     301 2020-02-29
26     301 2020-03-31
27     301 2020-04-30
28     301 2020-05-31
29     301 2020-06-30
30     301 2020-07-31
31     301 2020-08-31
32     301 2020-09-30
33     301 2020-10-31
34     301 2020-11-30
35     301 2020-12-31

(you can do id_df = id_df.sort_values("dates", ignore_index=True, kind="mergesort") afterwards to sort with respect to dates in a stable manner.)

Upvotes: 3

Related Questions