Reuben Schmidt
Reuben Schmidt

Reputation: 23

Searching for data in dataframe

Firstly, my apologies if this question is too simple / obvious.

My question is:

I am using nested loops to check whether certain images are listed in a dataframe ('old_df'). If they are present, I add them to an empty list ('new_list').

Is there a faster or more performant way to do this?

images = []

for root, dirs, files in os.walk('/gdrive/MyDrive/CNN_Tute/data/images/'):
  for file in files:
    images.append(file)

new_list = []

for i in range(len(images)):
  for j in range(len(old_df)):
    if images[i] == old_df.iloc[j, 0]:
      new_list.append(old_df.iloc[j, :])

Upvotes: 1

Views: 60

Answers (2)

jezrael
jezrael

Reputation: 863501

If want test first column by position:

images = [file for root, dirs, files in os.walk('/gdrive/MyDrive/CNN_Tute/data/images/' 
          for file in files]

new_list = old_df.iloc[old_df.iloc[:, 0].isin(images).to_numpy(), 0].tolist()

Upvotes: 2

Serial Lazer
Serial Lazer

Reputation: 1669

You can achieve this in two lines:

images = [file for _, _, files in os.walk('/gdrive/MyDrive/CNN_Tute/data/images/' for file in files]

new_labels_df = xr_df[xr_df[[0]].isin(images)]

Upvotes: 0

Related Questions