Marco
Marco

Reputation: 2807

A loop for pairwise comparison with previous value within a column in Python/Pandas

One crucial step in my project is to track the absolute difference of values in a column of pandas dataframe for subsamples.

I managed to write a for-loop to create my subsamples. I select every person and go through every year this person is observed. I further accessed the index of each groups first element, and even compared it the each ones second element.

Here is my MWE data:

df = pd.DataFrame({'year': ['2001', '2004', '2005', '2006', '2007', '2008', '2009',
                             '2003', '2004', '2005', '2006', '2007', '2008', '2009',
                            '2003', '2004', '2005', '2006', '2007', '2008', '2009'],
                   'id': ['1', '1', '1', '1', '1', '1', '1', 
                          '2', '2', '2', '2', '2', '2', '2',
                         '5', '5', '5','5', '5', '5', '5'],
                   'money': ['15', '15', '15', '21', '21', '21', '21', 
                             '17', '17', '17', '20', '17', '17', '17',
                            '25', '30', '22', '25', '8', '7', '12']}).astype(int)

Here is my code:

# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
    for j in df[df['id']==i].year: 
# access the index of the first element of that ID, as integer
        index = df[df['id']==i].index.values.astype(int)[0]
# use that index to calculate absolute difference of the first and second element 
        abs_diff = abs( df['money'].iloc[index] - df['money'].iloc[index+1] )
# print all the changes, before further calculations
        index =+1
        print(abs_diff)

My index is not updating. It yields 0000000 0000000 5555555 (3 x 7 changes) but it should show 0,0,0,6,0,0,0 0,0,0,3,-3,0,0 0,5,-8,3,-17,-1,5 (3 x 7 changes). Since the either first or last element have no change, I added 0 in front of each group.

Solution I changed the second loop from for to while:

for i in df.id.unique():
first = df[df['id']==i].index.values.astype(int)[0] # ID1 = 0 
last = df[df['id']==i].index.values.astype(int)[-1] # ID1 = 6

while first < last:

    abs_diff = abs( df['money'][first] - df['money'][first+1] ) 
    print(abs_diff)
    first +=1

Upvotes: 1

Views: 1274

Answers (2)

tawab_shakeel
tawab_shakeel

Reputation: 3739

`for i in df.id.unique(): for j in df[df['id']==i].year:

    index = df[(df['id']==i)&(df['year']==j)].index.values[0].astype(int)
    try:
        abs_diff = abs(df['money'].iloc[index] - df['money'].iloc[index+1] )
    except:
        pass
    print(abs_diff)`

output: 0 0 6 0 0 0 4 0 0 3 3 0 0 8 5 8 3 17 1 5

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71580

You're currently always checking the first value of each batch, so you'd need to do:

# do it for all IDs in my dataframe
for i in df.id.unique():
# now check every given year for that particular ID
    for idx,j in enumerate(df[df['id']==i].year): 
# access the index of the first element of that ID, as integer
        index = df[df['id']==i].index.values.astype(int)[idx]
# use that index to calculate absolute difference of the first and second element
        try:
            abs_diff = abs( df['money'][index] - df['money'][index+1] )
        except:
            continue
# print all the changes, before further calculations
        index =+1
        print(abs_diff)

Which outputs:

0
0
6
0
0
0
4
0
0
3
3
0
0
8
5
8
3
17
1
5

Upvotes: 1

Related Questions