Looping through Pandas dataframe to generate list - most efficient way

Question

I have a dataframe in the following format (actually ~200,000 rows. Of these ~20% are active - 'Y', the rest are 'N'):

active  adtype           body      eng          first scan   id
N       Private Seller Car  Â Coupe Â 8cyl 4.7L     31/01/2016  SSE-AD-3469148
Y       Dealer: Near New    Â Coupe Â 12cyl 6.5L    31/01/2016  OAG-AD-12326299
N       Dealer: Used Car    Â Coupe Â 12cyl 6.5L    31/01/2016  OAG-AD-6834787

I'm creating a list of the ids that are then cross-checked against some website scrape data to find new items:

database_ids = database_records['id'].tolist() #simple list of ad IDs from CSV
database_ids = set(database_ids)
database_dicts = database_records.to_dict(orient='records') #Converted to list of dicts  
newads = []
adscrape_ids = []

#Search database for existing ads. Append new ads to 'newads'
 for ad in adscrape:
     ad['last scan'] = date
     ad['active'] = 'Y'
     adscrape_ids.append(ad['id'])
     if ad['id'] not in database_ids:
         ad['first scan'] = date
         print 'new ad:',ad
         newads.append(ad)

I want to speed this process up by limiting database_ids to only ids that are still active ('Y'). Is there any pandas-specific, efficient way of doing this, or should I just create a loop:

for row in database_dicts:
    if row['active'] == 'Y':
        database_ids.append(row['id'])
database_ids = set(database_ids)

Ami Tavory · Accepted Answer

You can do this far more efficiently (I'm willing to bet you'll be able to see a noticeable difference in speed):

set(database_dicts[database_dicts.active == 'Y']['id'].unique())

database_dicts[database_dicts.active == 'Y'] filters and retains the rows you want.
.unique() will return the unique values (in this case, of the id column).

In general, you should try to do as much as possible while the data is in the DataFrame - it is much more efficient than loops and pure Python.

Looping through Pandas dataframe to generate list - most efficient way

Answers (1)

Related Questions