Reputation: 1228
Is this the most efficient way to get the next row of data based on a matching value in the prior row? It seems terribly cumbersome but the Int64Index
type seems to not play well.
df_of_urls = {'ID': [100,101], 'URL': ['https://www.firsturl.com','https://www.secondurl.com']}
df_of_urls = pd.DataFrame.from_dict(df_of_urls)
prior_url = 'https://www.firsturl.com'
next_url = df_of_urls.iloc[df_of_urls[df_of_urls['URL']==prior_url ].index+1,1].values[0]
Upvotes: 1
Views: 1178
Reputation: 164773
Indexing a series is more efficient than indexing a dataframe.
# Index using iat accessor
next_url = df_of_urls['URL'].iat[np.where(df_of_urls['URL']==prior_url)[0][0] + 1]
# Index using NumPy array
next_url = df_of_urls['URL'].values[np.where(df_of_urls['URL']==prior_url)[0][0] + 1]
This algorithm is inefficient for certain cases. A full iteration is always required, even when the condition is satisfied near the beginning of the array. A manual loop can solve this problem by terminating immediately when the condition is satisfied.
See also: Efficiently return the index of the first value satisfying condition in array.
Upvotes: 1