prasoon shukla
prasoon shukla

Reputation: 11

Performance issue pandas 6 mil rows

need one help.

I am trying to concatenate two data frames. 1st has 58k rows, other 100. Want to concatenate in a way that each of 58k row has 100 rows from other df. So in total 5.8 mil rows. Performance is very poor, takes 1 hr to do 10 pct. Any suggestions for improvement? Here is code snippet.

def myfunc(vendors3,cust_loc):
cust_loc_vend = pd.DataFrame()
cust_loc_vend.empty
for i,row in cust_loc.iterrows():
    clear_output(wait=True)
    a= row.to_frame().T
    df= pd.concat([vendors3, a],axis=1, ignore_index=False)
    #cust_loc_vend = pd.concat([cust_loc_vend, df],axis=1, ignore_index=False)
    cust_loc_vend= cust_loc_vend.append(df)
    print('Current progress:',np.round(i/len(cust_loc)*100,2),'%')
return cust_loc_vend

For e.g. if first DF has 5 rows and second has 100 rows

DF1 (sample 2 columns) enter image description here

enter image description here

I want a merged DF such that each row in DF 2 has All rows from DF1- enter image description here

Upvotes: 1

Views: 34

Answers (1)

dper
dper

Reputation: 904

Well all you are looking for is a join.But since there is no column column, what you can do is create a column which is similar in both the dataframes and then drop it eventually.

df['common'] = 1
df1['common'] = 1

df2 = pd.merge(df, df1, on=['common'],how='outer')

df = df.drop('tmp', axis=1)

where df and df1 are dataframes.

Upvotes: 1

Related Questions