Reputation: 159
Imagine we have a dataframe with structure like:
df = pd.DataFrame({
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4],
General idea is to shift each rows, corresponds to value in the 'Year' column, 2017 is the base year, each other row should be shifted to the right on (Year - 2017) cells and new cells should be filled by zeros (0), like :
df = pd.DataFrame({
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,0,0,5,5,4],
'C':[0,0,4,0,0,7],
'D':[0,5,0,5,7,1],
'E':[5,0,3,9,2,4],
'F':[0,1,6,0,0,0],
'G':[0,3,0,0,0,0],
})
ps: Actualy we then need to sum pair-wise some of result rows, so that 'Year' of each column are the same
It's only for first step when we sum 0 and 2 rows. Then it should be 1 and 3, and so on
So, maybe there is some pandas functionality which helps to do this task without pre-shifting...
Upvotes: 2
Views: 350
Reputation: 862671
If use shift
in pandas by default, then last columns are lost. So is necessary first add new columns filled by missing values - number of columns depends of difference of non 2017 values.
df = df.set_index('Year')
diff = np.setdiff1d(df.index.dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]
df = df.assign(**{f'new{x}':np.nan for x in range(max(diff-2017))})
Then you can use shift
in loop and filter by DataFrame.loc
by years in index:
for y in diff:
df.loc[y, :] = df.astype(float).shift(y - 2017, axis=1).loc[y, :]
Last replace missing values, cast to integers and convert index to columns:
df = df.fillna(0).astype(int).reset_index()
print (df)
Year B C D E new0 new1
0 2017 4 0 0 5 0 0
1 2019 0 0 5 0 1 3
2 2018 0 4 0 3 6 0
3 2017 5 0 5 9 0 0
4 2017 5 0 7 2 0 0
5 2017 4 7 1 4 0 0
EDIT:
Solution with another column:
df = pd.DataFrame({
'new':list('abcdef'),
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4]})
print (df)
new Year B C D E
0 a 2017 4 0 0 5
1 b 2019 5 0 1 3
2 c 2018 4 0 3 6
3 d 2017 5 0 5 9
4 e 2017 5 0 7 2
5 f 2017 4 7 1 4
df = df.set_index(['new','Year'])
diff = np.setdiff1d(df.index.get_level_values('Year').dropna().unique(), [2017]).astype(int)
print (diff)
[2018 2019]
df1 = pd.DataFrame(index=df.index, columns=['new{}'.format(x) for x in range(max(diff-2017))])
df = pd.concat([df, df1], axis=1)
print (df)
B C D E new0 new1
new Year
a 2017 4 0 0 5 NaN NaN
b 2019 5 0 1 3 NaN NaN
c 2018 4 0 3 6 NaN NaN
d 2017 5 0 5 9 NaN NaN
e 2017 5 0 7 2 NaN NaN
f 2017 4 7 1 4 NaN NaN
for y in diff:
idx = pd.IndexSlice
df.loc[idx[:, y], :] = df.astype(float).shift(y - 2017, axis=1).loc[idx[:, y], :]
df = df.fillna(0).astype(int).reset_index()
print (df)
new Year B C D E new0 new1
0 a 2017 4 0 0 5 0 0
1 b 2019 0 0 5 0 1 3
2 c 2018 0 4 0 3 6 0
3 d 2017 5 0 5 9 0 0
4 e 2017 5 0 7 2 0 0
5 f 2017 4 7 1 4 0 0
Upvotes: 1
Reputation: 7214
I created programmatically the steps you took from the first df frame to the last one. I did this because it seems like you may be looking for how to do that programmatically, and it may help with your end result. With a little more understanding, i can probably make this process easier:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Year':[2017, 2019, 2018, 2017, 2017, 2017],
'B':[4,5,4,5,5,4],
'C':[0,0,0,0,0,7],
'D':[0,1,3,5,7,1],
'E':[5,3,6,9,2,4],})
df.insert(column='F',loc=len(df)-1,value=np.zeros(len(df),dtype=int))
df.insert(column='G',loc=len(df)-1,value=np.zeros(len(df),dtype=int))
df1 = df.T
cols =df1.iloc[0]
df1.columns = cols
df1.drop('Year', inplace=True)
df1.iloc[0:, [1]] = np.roll(df1.iloc[0:, [1]], shift=2)
df1.iloc[0:, [2]] = np.roll(df1.iloc[0:, [2]], shift=1)
df = df1.T.reset_index()
res = df.iloc[2] + df.iloc[0]
df = df.append(res, ignore_index=True)
df['Year'][6]= 'res'
output:
Year B C D E G F
0 2017 4 0 0 5 0 0
1 2019 0 0 5 0 1 3
2 2018 0 4 0 3 6 0
3 2017 5 0 5 9 0 0
4 2017 5 0 7 2 0 0
5 2017 4 7 1 4 0 0
6 res 4 4 0 8 6 0
Upvotes: 1