alok sharma
alok sharma

Reputation: 35

Inner Join hdf5 dataframe vaex python

I need to compare two csv and do inner join .I am using vaex which is faster than pandas but got stuck after a point. my code was working with pandas but it was slow .How can I inner join two hdf5 type files and get the output in csv .

My code

    vaex_df1 = vaex.from_csv(file1,convert=True, chunk_size=5_000)
    vaex_df2 = vaex.from_csv(file2,convert=True, chunk_size=5_000)
    vaex_df1 = vaex.open(file1+'.hdf5')
    vaex_df2 = vaex.open(file2+'.hdf5')
    print(type(vaex_df1),vaex_df1)
    print(type(vaex_df2),vaex_df2)
    df_join = pd.merge(vaex_df1,vaex_df2,how='inner',left_on ='CL_CLIENT_ID',right_on='CL_CLIENT_ID')
    df_join.to_csv('C:\\Users\\abc\Desktop\\New folder\\file3.csv')
    print("succes in compare")

As we do merge in pandas is there a way to inner join in vaex as I couldnt find much on internet. code gives error at point 'df_join=pd.merge' which is obvious .

Upvotes: 1

Views: 2183

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

The vaex tutorial has a section on joining: https://vaex.io/docs/tutorial.html#Joining. The API looks identical to that of pandas. Try:

df_join = vaex_df1.join(vaex_df2, 
                        how='inner', 
                        left_on ='CL_CLIENT_ID',
                        right_on='CL_CLIENT_ID')

Upvotes: 1

Related Questions