myeewyee
myeewyee

Reputation: 767

Looping through Pandas dataframe to generate list - most efficient way

I have a dataframe in the following format (actually ~200,000 rows. Of these ~20% are active - 'Y', the rest are 'N'):

active  adtype           body      eng          first scan   id
N       Private Seller Car   Coupe  8cyl 4.7L     31/01/2016  SSE-AD-3469148
Y       Dealer: Near New     Coupe  12cyl 6.5L    31/01/2016  OAG-AD-12326299
N       Dealer: Used Car     Coupe  12cyl 6.5L    31/01/2016  OAG-AD-6834787

I'm creating a list of the ids that are then cross-checked against some website scrape data to find new items:

database_ids = database_records['id'].tolist() #simple list of ad IDs from CSV
database_ids = set(database_ids)
database_dicts = database_records.to_dict(orient='records') #Converted to list of dicts  
newads = []
adscrape_ids = []

#Search database for existing ads. Append new ads to 'newads'
 for ad in adscrape:
     ad['last scan'] = date
     ad['active'] = 'Y'
     adscrape_ids.append(ad['id'])
     if ad['id'] not in database_ids:
         ad['first scan'] = date
         print 'new ad:',ad
         newads.append(ad)

I want to speed this process up by limiting database_ids to only ids that are still active ('Y'). Is there any pandas-specific, efficient way of doing this, or should I just create a loop:

for row in database_dicts:
    if row['active'] == 'Y':
        database_ids.append(row['id'])
database_ids = set(database_ids)

Upvotes: 1

Views: 116

Answers (1)

Ami Tavory
Ami Tavory

Reputation: 76297

You can do this far more efficiently (I'm willing to bet you'll be able to see a noticeable difference in speed):

set(database_dicts[database_dicts.active == 'Y']['id'].unique())
  • database_dicts[database_dicts.active == 'Y'] filters and retains the rows you want.

  • .unique() will return the unique values (in this case, of the id column).

In general, you should try to do as much as possible while the data is in the DataFrame - it is much more efficient than loops and pure Python.

Upvotes: 1

Related Questions