Reputation: 739
I would like to get the difference between each 2 rows of the column duration
and then fill the values in a new column difference
or print it.
So basically I want: row(1)-row(2)=difference1, row(3)-row(4)=difference2, row(5)-row(6)=difference3 ....
Example of a code:
data = {'Profession':['Teacher', 'Banker', 'Teacher', 'Judge','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Male','Male','Female'],'Size':['M','M','L','S','S','M'],'Duration':['5','6','2','3','4','7']}
data2={'Profession':['Doctor', 'Scientist', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Male','Male', 'Female','Female','Male','Male'],'Size':['L','M','L','M','L','L'],'Duration':['1','2','9','10','1','17']}
data3 = {'Profession':['Banker', 'Banker', 'Doctor', 'Doctor','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Female','Female','Male'],'Size':['S','M','S','M','L','S'],'Duration':['15','8','5','2','11','10']}
data4={'Profession':['Judge', 'Judge', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Female','Female', 'Female','Female','Female','Female'],'Size':['M','S','L','S','M','S'],'Duration':['1','2','9','10','1','17']}
df= pd.DataFrame(data)
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
DATA=pd.concat([df,df2,df3,df4])
DATA.groupby(['Profession','Size','Gender']).agg('sum')
D=DATA.reset_index()
D['difference']=D['Duration'].diff(-1)
I tried using diff(-1) but it's not exactly what I'm looking for. any ideas?
Upvotes: 1
Views: 63
Reputation: 640
Is that what you wanted?
D["Neighbour"]=D["Duration"].shift(-1)
# fill empty lines with 0
D["Neighbour"] = D["Neighbour"].fillna(0)
# convert columns "Neighbour" and "Duration" to numeric
D["Neighbour"] = pd.to_numeric(D["Neighbour"])
D["Duration"] = pd.to_numeric(D["Duration"])
# get difference
D["difference"]=D["Duration"] - D["Neighbour"]
# remove "Neighbour" column
D = D.drop(columns=["Neighbour"], axis=1)
# remove odd lines
D.loc[1::2,"difference"] = None
# print D
D
Upvotes: 1