Mrunal Modi
Mrunal Modi

Reputation: 45

Python Pandas merge while maintaining number of rows (just adding columns)

How do I merge 2 DF's whilst maintaining the number of rows?

The date is the unique primary key, in fact the 2 dataframes were filtered from the same parent dataframe.

df_afs = df[df['Filesystem'].str.contains(".snapshot") == False]
df_sfs = df[df['Filesystem'].str.contains(".snapshot") == True]
df_all = pd.merge(df_afs, 
          df_sfs[['date', 'sfs_volume','sfs_kbytes','sfs_used','sfs_avail']],
          on = 'date', 
          how = 'outer')

df_afs

date    hostname    afs_volume  afs_kbytes  afs_used    afs_avail
3   09/02/2020 00:19:16     nascn01     /vol/vol0/  347277596   115158404   232119192
5   09/02/2020 00:19:16     nascn01     /vol/vol1/  996148  492     995656
7   09/02/2020 00:19:16     nascn01     /vol/vol2/  996148  656     995492
9   09/02/2020 00:19:16     nascn01     /vol/vol3/  996148  652     995496
11  09/02/2020 00:19:16     nascn01     /vol/vol4/  996148  564     995584

df_sfs

date    hostname    sfs_volume  sfs_kbytes  sfs_used    sfs_avail
4   09/02/2020 00:19:16     nascn01     /vol/vol0/.snapshot     18277768    3975892     14301876
6   09/02/2020 00:19:16     nascn01     /vol/vol1/.snapshot     52428   3520    48908
8   09/02/2020 00:19:16     nascn01     /vol/vol2/.snapshot     52428   3208    49220
10  09/02/2020 00:19:16     nascn01     /vol/vol3/.snapshot     52428   3568    48860
12  09/02/2020 00:19:16     nascn01     /vol/vol4/.snapshot     52428   3432    48996

Error Screenshot Error Screenshot

Shape

Upvotes: 1

Views: 103

Answers (1)

emiljoj
emiljoj

Reputation: 409

you would need to:

specify column of the dataframe on the left and the dataframe on the right

left_on = 'date', right_on = 'date'..

You can visit many articles online to see which merge type would work best for your case, but the fact that the date rows are not unique on may provide difficult in maintaining the same amount of columns as it's not clear how they should merge. For more see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

Maybe you could try and create a new column for this purpose in the first dataframe by concatenating so that the dataframes would rather be joined on

df_afs['afs_volume_rec'] = df_afs['afs_volume_rec'] + '.snapshot'

followed by the merge on this column rather than the date

left_on = 'afs_volume_rec', right_on = 'sfs_volume'

Upvotes: 1

Related Questions