Reputation: 20150
I have this:
partial = {}
for d in devs["d"]:
for k in a1km:
total = len(cp[(cp["r"]==d) & (cp["s"]==k)])
partial.update({str(d)+str(k): total})
Variables cp
and devs
are pandas
dataframes, and a1km
is a dictionary that contains a site, and all sites 1km from it (pre calculated). The output I'm after would be, for each d
and for each site k
, all the records in the cp
dataframe that matches the query, stored total
, so:
d, k, total
I've never worked with numpy
, and I'm trying to learn as fast as I can, but the library is just too big for me to process, given time considerations in my lab. So my question is, how do I "translate" the code below to numpy
to imporve performance?
Upvotes: 0
Views: 59
Reputation: 6505
You can filter the dataframe and use pandas.DataFrame.groupby:
tmp = cp[(cp['r'].isin(devs['d'].unique()) & (cp['s'].isin(a1km))]
result_df = tmp.groupby(['r','s']).size()
Remark that this can be quite slow as well.
Then to make it into a dictionary:
partial = {str(k[0]) + str(k[1]): v for k,v in result_df.to_dict().items()}
Upvotes: 1