Reputation: 428
I'm new to Pandas and coming from a SQL/SAS background. I'm trying to take financial information from a prior period and merge it onto my DataFrame. Here's a data example:
np.random.seed(0)
df = pd.DataFrame({
'company': pd.Series(['A','A','A','B','B','B','C','C','C']),
'year': pd.Series([1,2,3,1,3,4,2,3,4]),
'quarter': pd.Series([2,2,2,3,4,4,1,1,3]),
'amt': np.random.randn(9)
})
What I want is to take amt
from the prior year when company
and quarter
match. Here's what I would do in SQL:
SELECT a.*, b.amt as amt_prior
FROM df as a
LEFT JOIN df as b
ON a.company = b.company
AND a.quarter = b.quarter
AND a.year = (b.year + 1);
What I have come up with is the following. It works but it seems like I'm not doing something the Pandas way. Is there a better way for me to do this?
df2 = pd.merge(
df, df,
how='left',
left_on=['company', 'quarter', 'year'],
right_on=['company', 'quarter', np.array(df['year']) + 1],
suffixes=('', '_prior')
)
Upvotes: 0
Views: 77
Reputation: 402483
You can use a groupby
and transform
+ shift
:
np.random.seed(0)
df
amt company quarter year
0 1.764052 A 2 1
1 0.400157 A 2 2
2 0.978738 A 2 3
3 2.240893 B 3 1
4 1.867558 B 4 3
5 -0.977278 B 4 4
6 0.950088 C 1 2
7 -0.151357 C 1 3
8 -0.103219 C 3 4
priors = df.groupby(['company', 'quarter'])\
.shift().add_suffix('_prior')
priors
amt_prior year_prior
0 NaN NaN
1 1.764052 1.0
2 0.400157 2.0
3 NaN NaN
4 NaN NaN
5 1.867558 3.0
6 NaN NaN
7 0.950088 2.0
8 NaN NaN
Now, concatenate priors
with the original:
df = pd.concat([df, priors], 1)
df
amt company quarter year amt_prior year_prior
0 1.764052 A 2 1 NaN NaN
1 0.400157 A 2 2 1.764052 1.0
2 0.978738 A 2 3 0.400157 2.0
3 2.240893 B 3 1 NaN NaN
4 1.867558 B 4 3 NaN NaN
5 -0.977278 B 4 4 1.867558 3.0
6 0.950088 C 1 2 NaN NaN
7 -0.151357 C 1 3 0.950088 2.0
8 -0.103219 C 3 4 NaN NaN
This matches with your expected output df2
.
You can also get the same answer using Vaishali's solution, but using 2 groupby
s for the same operation is inefficient.
Upvotes: 2
Reputation: 38415
You can use
df['amt_prior'] = df.groupby(['company', 'quarter']).amt.shift()
df['year_prior'] = df.groupby(['company', 'quarter']).year.shift()
amt company quarter year amt_prior year_prior
0 -1.309558 A 2 1 NaN NaN
1 0.579060 A 2 2 -1.309558 1.0
2 -0.373324 A 2 3 0.579060 2.0
3 -0.515989 B 3 1 NaN NaN
4 1.013687 B 4 3 NaN NaN
5 1.419462 B 4 4 1.013687 3.0
6 -0.918983 C 1 2 NaN NaN
7 0.121057 C 1 3 -0.918983 2.0
8 -0.188670 C 3 4 NaN NaN
Upvotes: 3