Reputation: 2807
One crucial step in my project is to track the absolute difference of values in a column of pandas dataframe for subsamples.
I managed to write a for-loop to create my subsamples. I select every person and go through every year this person is observed. I further accessed the index of each groups first element, and even compared it the each ones second element.
Here is my MWE data:
df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
'2003', '2004', '2005', '2006', '2007', '2008', '2009',
'2003', '2004', '2005', '2006', '2007', '2008', '2009'],
'id': ['1', '1', '1', '1', '1', '1', '1',
'2', '2', '2', '2', '2', '2', '2',
'5', '5', '5','5', '5', '5', '5'],
'money': ['15', '15', '15', '21', '21', '21', '21',
'17', '17', '17', '20', '17', '17', '17',
'25', '30', '22', '25', '8', '7', '12']}).astype(int)
Here is my code:
# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
for j in df[df['id']==i].year:
# access the index of the first element of that ID, as integer
index = df[df['id']==i].index.values.astype(int)[0]
# use that index to calculate absolute difference of the first and second element
abs_diff = abs( df['money'].iloc[index] - df['money'].iloc[index+1] )
# print all the changes, before further calculations
index =+1
print(abs_diff)
My index is not updating. It yields 0000000 0000000 5555555 (3 x 7 changes) but it should show 0,0,0,6,0,0,0 0,0,0,3,-3,0,0 0,5,-8,3,-17,-1,5 (3 x 7 changes). Since the either first or last element have no change, I added 0 in front of each group.
Solution I changed the second loop from for to while:
for i in df.id.unique():
first = df[df['id']==i].index.values.astype(int)[0] # ID1 = 0
last = df[df['id']==i].index.values.astype(int)[-1] # ID1 = 6
while first < last:
abs_diff = abs( df['money'][first] - df['money'][first+1] )
print(abs_diff)
first +=1
Upvotes: 1
Views: 1274
Reputation: 3739
`for i in df.id.unique(): for j in df[df['id']==i].year:
index = df[(df['id']==i)&(df['year']==j)].index.values[0].astype(int)
try:
abs_diff = abs(df['money'].iloc[index] - df['money'].iloc[index+1] )
except:
pass
print(abs_diff)`
output: 0 0 6 0 0 0 4 0 0 3 3 0 0 8 5 8 3 17 1 5
Upvotes: 2
Reputation: 71580
You're currently always checking the first value of each batch, so you'd need to do:
# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
for idx,j in enumerate(df[df['id']==i].year):
# access the index of the first element of that ID, as integer
index = df[df['id']==i].index.values.astype(int)[idx]
# use that index to calculate absolute difference of the first and second element
try:
abs_diff = abs( df['money'][index] - df['money'][index+1] )
except:
continue
# print all the changes, before further calculations
index =+1
print(abs_diff)
Which outputs:
0
0
6
0
0
0
4
0
0
3
3
0
0
8
5
8
3
17
1
5
Upvotes: 1