Reputation: 617
I'm having difficulty coming up with a simple merge logic.
TimeSeries = A.merge(B, on = ['YEAR','MONTH'], how= 'left')
I have the following dataframe. A continuous month/year column from 2001-2018 (12 months X 17 yrs)
A:
|---------------------|------------------|
| Month | Year |
|---------------------|------------------|
| 12 | 2001 |
|---------------------|------------------|
| 01 | 2002 |
|---------------------|------------------|
| ...... | ..... |
|---------------------|------------------|
| 12 | 2018 |
|---------------------|------------------|
And the other data frame which has a unique ID, month, year and $ amount
B:
|-----------------|----------------|--------------|--------|
| Month | Year | ID | Amount |
|-----------------|----------------|--------------|--------|
| 05 | 2013 | 2 | 500K |
|-----------------|----------------|--------------|--------|
| 01 | 2002 | 1 | 200K |
|-----------------|----------------|--------------|--------|
| 12 | 2016 | 3 | 800K |
|-----------------|----------------|--------------|--------|
How would I be able to append/merge both dataframes so each ID from dataframe B has the entire time series from dataframe A. And it should merge on the correct month and year column.
Desired Output for ID. Each ID should have ~204 rows and if the month and year are same, then display the correct $ amount
|---------------------|------------------|--------------|-----------|
| Month | Year | ID | Amount |
|---------------------|------------------|--------------|-----------|
| 01 | 2001 | 1 | NaN |
|---------------------|------------------|--------------|-----------|
| 01 | 2002 | 1 | 200K |
|---------------------|------------------|--------------|-----------|
| 12 | 2018 | 1 | NaN |
|---------------------|------------------|--------------|-----------|
Upvotes: 2
Views: 367
Reputation: 59519
You want .reindex
, that way ID
is filled in for each row. There's no need for DataFrameA:
import pandas as pd
names = ['Month', 'Year', 'ID']
idx = pd.MultiIndex.from_product([range(1, 13, 1), range(2001, 2019, 1), B.ID.unique()],
names=names)
B = B.set_index(names).reindex(idx).reset_index()
print(B.head())
# Month Year ID Amount
#0 1 2001 2 NaN
#1 1 2001 1 NaN
#2 1 2001 3 NaN
#3 1 2002 2 NaN
#4 1 2002 1 200K
B.groupby('ID').size()
#ID
#1 216
#2 216
#3 216
#dtype: int64
Performant Cartesian Product has ways to optimize the product if it's rather slow
Upvotes: 1