Reputation: 45
How do I merge 2 DF's whilst maintaining the number of rows?
The date is the unique primary key, in fact the 2 dataframes were filtered from the same parent dataframe.
df_afs = df[df['Filesystem'].str.contains(".snapshot") == False]
df_sfs = df[df['Filesystem'].str.contains(".snapshot") == True]
df_all = pd.merge(df_afs,
df_sfs[['date', 'sfs_volume','sfs_kbytes','sfs_used','sfs_avail']],
on = 'date',
how = 'outer')
df_afs
date hostname afs_volume afs_kbytes afs_used afs_avail
3 09/02/2020 00:19:16 nascn01 /vol/vol0/ 347277596 115158404 232119192
5 09/02/2020 00:19:16 nascn01 /vol/vol1/ 996148 492 995656
7 09/02/2020 00:19:16 nascn01 /vol/vol2/ 996148 656 995492
9 09/02/2020 00:19:16 nascn01 /vol/vol3/ 996148 652 995496
11 09/02/2020 00:19:16 nascn01 /vol/vol4/ 996148 564 995584
df_sfs
date hostname sfs_volume sfs_kbytes sfs_used sfs_avail
4 09/02/2020 00:19:16 nascn01 /vol/vol0/.snapshot 18277768 3975892 14301876
6 09/02/2020 00:19:16 nascn01 /vol/vol1/.snapshot 52428 3520 48908
8 09/02/2020 00:19:16 nascn01 /vol/vol2/.snapshot 52428 3208 49220
10 09/02/2020 00:19:16 nascn01 /vol/vol3/.snapshot 52428 3568 48860
12 09/02/2020 00:19:16 nascn01 /vol/vol4/.snapshot 52428 3432 48996
Error Screenshot Error Screenshot
Upvotes: 1
Views: 103
Reputation: 409
you would need to:
specify column of the dataframe on the left and the dataframe on the right
left_on = 'date', right_on = 'date'..
You can visit many articles online to see which merge type would work best for your case, but the fact that the date rows are not unique on may provide difficult in maintaining the same amount of columns as it's not clear how they should merge. For more see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
Maybe you could try and create a new column for this purpose in the first dataframe by concatenating so that the dataframes would rather be joined on
df_afs['afs_volume_rec'] = df_afs['afs_volume_rec'] + '.snapshot'
followed by the merge on this column rather than the date
left_on = 'afs_volume_rec', right_on = 'sfs_volume'
Upvotes: 1