Reputation: 1273
I have a [long] pandas data frame with 2 columns. The first column is for a prescription number (keep in mind these are not unique as multiple rows can have the same prescription number). The second column is 1 item in that transaction number. I want to create a list of items for each transaction number (with duplicates removed) and put each of these lists into a larger, nested list with the length equaling the number of UNIQUE transaction numbers.
I have successfully achieved this feat, however, it takes a while to run and I would like to know a better (i.e., faster) way of doing it. My code is below:
# get the unique values for prescription
list_prescription = list(pd.value_counts(df['prescription']).index)
# make a list of product_name for each tx_plan_id_date (this will be time consuming)
time_start = datetime.datetime.now()
counter = 1
list_list_product_name = []
for prescription in list_prescription:
# subset to just that tx_plan_id_date
df_subset = df[df['prescription'] == prescription]
# put product_name into a list
list_product_name = list(df_subset['product_name'])
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)
# get current time
time_current = datetime.datetime.now()
# get minutes elapsed from time_start
time_elapsed = (time_current - time_start).seconds/60
# print a message to the console for status
stdout.write('\r{0}/{1}; {2:0.4f}% complete; elapsed time: {3:0.2} min.'.format(counter, len(list_prescription), (counter/len(list_prescription))*100, time_elapsed))
stdout.flush()
# increase counter by 1
counter += 1
Upvotes: 2
Views: 115
Reputation: 366
you can replace this part
# put product_name into a list
list_product_name = list(df_subset['product_name'])
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)
with
list_list_product_name.append(df_subset['product_name'].unique().tolist())
also, you might want to check groupby
Upvotes: 1