Reputation: 1923
I have a simple piece of code that iterates through a list of id's, and if an id is in a particular data frame column(in this case, the column is called uniqueid
), it uses iloc
to get the value from another column on the matching row and then adds it to as a value in a dictionary with the id as the key:
union_cols = ['uniqueid', 'FLD_ZONE', 'FLD_ZONE_1', 'ST_FIPS', 'CO_FIPS', 'CID']
union_df = gpd.GeoDataFrame.from_features(records(union_gdb, union_cols))
pop_df = pd.read_csv(pop_csv, low_memory=False) # Example dataframe
uniqueid_inin = ['', 'FL1234', 'F54323', ....] # Just an example
isin_dict = dict()
for id in uniqueid_inin:
if (id is not '') & (id in pop_df.uniqueid.values):
v = pop_df.loc[pop_df['uniqueid'] == id, 'Pop_By_Area'].iloc[0]
inin_dict.setdefault(id, v)
This works, but it is very slow. Is there a quicker way to do this?
Upvotes: 2
Views: 4705
Reputation: 1923
To resolve this issue (and make the process more efficient) I had to think about the process in a different way that took advantage of Pandas
and didn't rely on a generic Python solution. I first had to get a list of only the uniqueids from my union_df that were absolutely in pop_df. If they were not, applying the .isin()
method would throw an indexing error.
#Get list of uniqueids in pop_df
pop_uniqueids = pop_df['uniqueid'].unique()
#Get only the union_df rows where the uniqueid matches pop_uniqueid
union_df = union_df.loc[(union_df['uniqueid'].isin(pop_uniqueids))]
union_df = union_df.reset_index()
union_df = union_df.drop(columns='index')
When the uniqueid_inin
list is created from union_df (by just getting the uniqueid
's from rows where my zone_status
column is equal to 'in-in'), it will only contain a subset of items that are definitely in pop_df
and empty values are no longer an issue. Then, I simply create a subset dataframe using the list and zip the desired column values together in a dictionary
:
inin_subset =pop_df[ pop_df['uniqueid'].isin(uniqueid_inin)]
inin_pop_dict = dict(zip(inin_subset.uniqueid, inin_subset.Pop_By_Area))
I hope this technique helps.
Upvotes: 2