user2723494
user2723494

Reputation: 1228

return next row of dataframe based on matching condition

Is this the most efficient way to get the next row of data based on a matching value in the prior row? It seems terribly cumbersome but the Int64Index type seems to not play well.

df_of_urls = {'ID': [100,101], 'URL': ['https://www.firsturl.com','https://www.secondurl.com']}
df_of_urls = pd.DataFrame.from_dict(df_of_urls)

prior_url = 'https://www.firsturl.com'

next_url = df_of_urls.iloc[df_of_urls[df_of_urls['URL']==prior_url ].index+1,1].values[0]

Upvotes: 1

Views: 1178

Answers (1)

jpp
jpp

Reputation: 164773

Indexing a series is more efficient than indexing a dataframe.

# Index using iat accessor
next_url = df_of_urls['URL'].iat[np.where(df_of_urls['URL']==prior_url)[0][0] + 1]

# Index using NumPy array
next_url = df_of_urls['URL'].values[np.where(df_of_urls['URL']==prior_url)[0][0] + 1]

This algorithm is inefficient for certain cases. A full iteration is always required, even when the condition is satisfied near the beginning of the array. A manual loop can solve this problem by terminating immediately when the condition is satisfied.

See also: Efficiently return the index of the first value satisfying condition in array.

Upvotes: 1

Related Questions