Reputation: 311
How can I iterate over pairs of rows of a Pandas DataFrame?
For example:
content = [(1,2,[1,3]),(3,4,[2,4]),(5,6,[6,9]),(7,8,[9,10])]
df = pd.DataFrame( content, columns=["a","b","interval"])
print df
output:
a b interval
0 1 2 [1, 3]
1 3 4 [2, 4]
2 5 6 [6, 9]
3 7 8 [9, 10]
Now I would like to do something like
for (indx1,row1), (indx2,row2) in df.?
print "row1:\n", row1
print "row2:\n", row2
print "\n"
which should output
row1:
a 1
b 2
interval [1,3]
Name: 0, dtype: int64
row2:
a 3
b 4
interval [2,4]
Name: 1, dtype: int64
row1:
a 3
b 4
interval [2,4]
Name: 1, dtype: int64
row2:
a 5
b 6
interval [6,9]
Name: 2, dtype: int64
row1:
a 5
b 6
interval [6,9]
Name: 2, dtype: int64
row2:
a 7
b 8
interval [9,10]
Name: 3, dtype: int64
Is there a builtin way to achieve this? I looked at df.groupby(df.index // 2) and df.itertuples but none of these methods seems to do what I want.
Edit: The overall goal is to get a list of bools indicating whether the intervals in column "interval" overlap. In the above example the list would be
overlaps = [True, False, False]
So one bool for each pair.
Upvotes: 8
Views: 12568
Reputation: 28303
shift the dataframe & concat it back to the original using axis=1
so that each interval & the next interval are in the same row
df_merged = pd.concat([df, df.shift(-1).add_prefix('next_')], axis=1)
df_merged
#Out:
a b interval next_a next_b next_interval
0 1 2 [1, 3] 3.0 4.0 [2, 4]
1 3 4 [2, 4] 5.0 6.0 [6, 9]
2 5 6 [6, 9] 7.0 8.0 [9, 10]
3 7 8 [9, 10] NaN NaN NaN
define an intersects function that works with your lists representation & apply on the merged data frame ignoring the last row where the shifted_interval
is null
def intersects(left, right):
return left[1] > right[0]
df_merged[:-1].apply(lambda x: intersects(x.interval, x.next_interval), axis=1)
#Out:
0 True
1 False
2 False
dtype: bool
Upvotes: 16
Reputation: 29635
If you want to keep the loop for
, using zip
and iterrows
could be a way
for (indx1,row1),(indx2,row2) in zip(df[:-1].iterrows(),df[1:].iterrows()):
print "row1:\n", row1
print "row2:\n", row2
print "\n"
To access the next row at the same time, start the second iterrow one row after with df[1:].iterrows()
. and you get the output the way you want.
row1:
a 1
b 2
Name: 0, dtype: int64
row2:
a 3
b 4
Name: 1, dtype: int64
row1:
a 3
b 4
Name: 1, dtype: int64
row2:
a 5
b 6
Name: 2, dtype: int64
row1:
a 5
b 6
Name: 2, dtype: int64
row2:
a 7
b 8
Name: 3, dtype: int64
But as said @RafaelC, doing for
loop might not be the best method for your general problem.
Upvotes: 3
Reputation: 81
You could try the iloc indexing.
Exmaple:
for i in range(df.shape[0] - 1):
idx1,idx2=i,i+1
row1,row2=df.iloc[idx1],df.iloc[idx2]
print(row1)
print(row2)
print()
Upvotes: 0
Reputation: 27879
To get the output you've shown use:
for row in df.index[:-1]:
print 'row 1:'
print df.iloc[row].squeeze()
print 'row 2:'
print df.iloc[row+1].squeeze()
print
Upvotes: 0