Alternatives to iloc for searching dataframes

Question

I have a simple piece of code that iterates through a list of id's, and if an id is in a particular data frame column(in this case, the column is called uniqueid), it uses iloc to get the value from another column on the matching row and then adds it to as a value in a dictionary with the id as the key:

union_cols = ['uniqueid', 'FLD_ZONE', 'FLD_ZONE_1', 'ST_FIPS', 'CO_FIPS', 'CID']
union_df = gpd.GeoDataFrame.from_features(records(union_gdb, union_cols))

pop_df = pd.read_csv(pop_csv, low_memory=False) # Example dataframe
uniqueid_inin = ['', 'FL1234', 'F54323', ....] # Just an example    
isin_dict = dict()

for id in uniqueid_inin:
    if (id is not '') & (id in pop_df.uniqueid.values):
        v = pop_df.loc[pop_df['uniqueid'] == id, 'Pop_By_Area'].iloc[0]
        inin_dict.setdefault(id, v)

This works, but it is very slow. Is there a quicker way to do this?

gwydion93 · Accepted Answer

To resolve this issue (and make the process more efficient) I had to think about the process in a different way that took advantage of Pandas and didn't rely on a generic Python solution. I first had to get a list of only the uniqueids from my union_df that were absolutely in pop_df. If they were not, applying the .isin() method would throw an indexing error.

#Get list of uniqueids in pop_df
pop_uniqueids = pop_df['uniqueid'].unique()

#Get only the union_df rows where the uniqueid matches pop_uniqueid
union_df = union_df.loc[(union_df['uniqueid'].isin(pop_uniqueids))]
union_df = union_df.reset_index()
union_df = union_df.drop(columns='index')

When the uniqueid_inin list is created from union_df (by just getting the uniqueid's from rows where my zone_status column is equal to 'in-in'), it will only contain a subset of items that are definitely in pop_df and empty values are no longer an issue. Then, I simply create a subset dataframe using the list and zip the desired column values together in a dictionary:

 inin_subset =pop_df[ pop_df['uniqueid'].isin(uniqueid_inin)]
 inin_pop_dict = dict(zip(inin_subset.uniqueid, inin_subset.Pop_By_Area))

I hope this technique helps.

Alternatives to iloc for searching dataframes

Answers (1)

Related Questions