Aaron England
Aaron England

Reputation: 1273

Faster nested list generation in Python

I have a [long] pandas data frame with 2 columns. The first column is for a prescription number (keep in mind these are not unique as multiple rows can have the same prescription number). The second column is 1 item in that transaction number. I want to create a list of items for each transaction number (with duplicates removed) and put each of these lists into a larger, nested list with the length equaling the number of UNIQUE transaction numbers.

I have successfully achieved this feat, however, it takes a while to run and I would like to know a better (i.e., faster) way of doing it. My code is below:

# get the unique values for prescription
list_prescription = list(pd.value_counts(df['prescription']).index)

# make a list of product_name for each tx_plan_id_date (this will be time consuming)
time_start = datetime.datetime.now()
counter = 1
list_list_product_name = []
for prescription in list_prescription:
    # subset to just that tx_plan_id_date
    df_subset = df[df['prescription'] == prescription]
    # put product_name into a list
    list_product_name = list(df_subset['product_name'])
    # remove any duplicates
    list_product_name = list(dict.fromkeys(list_product_name))
    # append list_product_name to list_list_product_name
    list_list_product_name.append(list_product_name)
    # get current time
    time_current = datetime.datetime.now()
    # get minutes elapsed from time_start
    time_elapsed = (time_current - time_start).seconds/60
    # print a message to the console for status
    stdout.write('\r{0}/{1}; {2:0.4f}% complete; elapsed time: {3:0.2} min.'.format(counter, len(list_prescription), (counter/len(list_prescription))*100, time_elapsed))
    stdout.flush()
    # increase counter by 1
    counter += 1

Upvotes: 2

Views: 115

Answers (1)

FrancescoLS
FrancescoLS

Reputation: 366

you can replace this part

# put product_name into a list
list_product_name = list(df_subset['product_name'])    
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)

with

list_list_product_name.append(df_subset['product_name'].unique().tolist())

also, you might want to check groupby

Upvotes: 1

Related Questions