Reputation: 3299
The objective is to subtract a row (N
) with previous row (N-1
) separated by groups.
Given a df
years nchar nval
0 2019 a 1
1 2019 b 1
2 2019 c 1
3 2020 a 1
4 2020 s 4
Lets,separate into group of year 2019
, and we denote it as df_2019
For df_2019, there we assign constant 10
.
Then,only for index 0
, we do the following operation and assign to a new column 'B`
df_2019.loc[df_2019.index[0], 'B']= 10 - df_2019['nval'].values[0]
Whereas, the other index
df_2019.loc[df_2019.index[N], 'B'] = df_2019['B'].values[N-1] - df_2019['nval'].values[N]
This, will produced the following table
years nchar nval C D B
1 2019 a 1 9
2 2019 b 1 8
3 2019 c 1 7
For the group 2020
, the same computation apply. However, the only difference is, the constant value is the 7
, which is taken from the last index of column B.
To answer this requirement, the following code is produced with extra possible groups.
import pandas as pd
year=[2019,2019,2019,2020,2020,2020,2020,2022,2022,2022]
nval=[1,1,1,1,4,1,4,5,6,7]
nchar=['a','b','c','a','s','c','a','b','c','g']
df=pd.DataFrame(zip(year,nchar,nval),columns=['years','nchar','nval'])
print(df)
year_ls=[2019,2020,2022]
nspacing_total=2
nspacing_between_df=4
all_df=[]
default_val=10
for idx,dyear in enumerate(year_ls):
df_=df[df['years']==dyear].reset_index(drop=True)
t=pd.DataFrame([[''] * 3]*len(df_), columns=["C", "D", "B"])
df_=pd.concat([df_,t],axis=1)
Total = df_['nval'].sum()
df_=pd.DataFrame([[''] * len(df.columns)]*1, columns=df.columns).append(df_).reset_index(drop=True)
if idx ==0:
df_.loc[df_.index[0], 'B']=default_val
if idx !=0:
pre_df=all_df[idx-1]
pre_val=pre_df['B'].values[-1]
nposi=1
pre_years=pre_df['years'].values[nposi]
df_.loc[df_.index[0], 'nchar']=f'From {pre_years}'
df_.loc[df_.index[0], 'B']=pre_val
for ndexd in range(df_.shape[0]-1):
df_.loc[df_.index[ndexd+1], 'B']=df_['B'].values[ndexd]-df_['nval'].values[ndexd+1]
df_=df_.append(pd.DataFrame([[''] * len(df.columns)]*nspacing_total, columns=df.columns)).reset_index(drop=True)
df_.loc[df_.index[-1], 'nval']=Total
df_.loc[df_.index[-1], 'nchar']='Total'
df_.loc[df_.index[-1], 'B']=df_['B'].values[0]-df_['nval'].values[-1]
all_df.append(df_)
However, I wonder whether this proposal can be further simplified further using pandas groupby
or other. I really appreciate for any tips.
Ultimately, I would like to express the table as below, which will be exported to excel
years nchar nval C D B
0 10
1 2019 a 1 9
2 2019 b 1 8
3 2019 c 1 7
4
5 Total 3 7
6
7
8
9
10 From 2019 7
11 2020 a 1 6
12 2020 s 4 2
13 2020 c 1 1
14 2020 a 4 -3
15
16 Total 10 -3
17
18
19
20
21 From 2020 -3
22 2022 b 5 -8
23 2022 c 6 -14
24 2022 g 7 -21
25
26 Total 18 -21
27
28
29
30
The code to produced the above table
# Optional to represent the table above
all_ap_df=[]
for a_df in all_df:
df=a_df.append(pd.DataFrame([[''] * len(df.columns)]*nspacing_between_df, columns=df.columns)).reset_index(drop=True)
all_ap_df.append(df)
df=pd.concat(all_ap_df,axis=0).reset_index(drop=True)
df.loc[df_.index[0], 'D']=df['B'].values[0]
df.loc[df_.index[0], 'B']=''
df = df.fillna('')
Upvotes: 1
Views: 81
Reputation: 323306
In your case chain with groupby
df['new'] = df.groupby('years')['nval'].cumsum().rsub(10)
Out[8]:
0 9
1 8
2 7
3 9
4 5
Name: nval, dtype: int64
Upvotes: 1
Reputation:
I think this is actually quite simple. Use groupby
+ cumsum
:
df['B'] = 10 - df['nval'].cumsum()
Output:
>>> df
years nchar nval B
0 2019 a 1 9
1 2019 b 1 8
2 2019 c 1 7
3 2020 a 1 6
4 2020 s 4 2
Upvotes: 1