Pandas: Updating dataframe column from other dataframes

Question

I am facing a problem updating the datetime values on a column combining data from other dataframes. The first df includes the car data of a car rental company:

df_cars = pd.DataFrame([['1708092730', 'MERC', 'GLK270', 'PLT01', '', ''],
                        ['T170900224', 'MERC', 'C200', 'PLT02', '', ''],
                        ['1708092733', 'BMW', 'Χ3', 'PLT03', '', ''],
                        ['T170902781', 'BMW', 'X6', 'PLT04', '', ''],
                        ['T171002176', 'VW', 'PASSAT', 'PLT05', '', ''],
                        ['1708090817', 'FORD', 'MUSTANG', 'PLT06', '', '']],
                columns=['RefNo', 'Brand', 'Model', 'Plate', 'Driver', 'Date'])

The second df consists of the garage outbound records:

df_out = pd.DataFrame([['1708092733', 'John', '2018-1-2', 'MC', 250],
                       ['T170900224', 'Phil', '2018-1-15', 'AMEX', 400],
                       ['T170902781', 'Ann', '2018-1-22', 'VI', 400],
                       ['1708092730', 'Ann', '2018-1-28', 'AMEX', 200],
                       ['1708092733', 'Jeff', '2018-2-1', 'VI', 200]],
               columns=['RefNo', 'Driver', 'Date', 'Payment', 'Deposit'])

and the third df includes the garage inbound records:

df_in = pd.DataFrame([['1708092733', '2018-1-18', 165],
                      ['T170902781', '2018-2-12', 600],
                      ['1708092730', '2018-2-15', 150]], 
              columns=['RefNo', 'Date', 'Ammount'])

The RefNo is a unique barcode on car keys. I am trying to update the df_cars columns:

'Driver' with the name of the last driver that has rented the car, in case it is out of the garage and
'Date' with the last date that the car status has been changed ('out of the garage' or 'in the garage')

The following update code is working fine with a small number of rows

df_OutIn = pd.concat([df_out, df_in])                           # concatenate Garage out & in entries
df_OutIn['Date'] = pd.to_datetime(df_OutIn.Date)                # OutIn df Date column to datetile format 
df_OutIn = df_OutIn.sort_values(by='Date')                      # sort OutIn df by Date
df_OutIn = df_OutIn.drop_duplicates(['RefNo'], keep='last')     # remove duplicates and keep last update per car

df_tmp = df_cars[['RefNo', 'Driver']].set_index('RefNo')        # Update 'Driver' in  df_cars
df_tmp.update(df_OutIn.set_index('RefNo'))
df_cars['Driver'] = df_tmp.values

df_tmp = df_cars[['RefNo', 'Date']].set_index('RefNo')         # Update 'Date' in  df_cars
df_tmp.update(df_OutIn.set_index('RefNo'))
df_cars['Date'] = df_tmp.values
df_cars['Date'] = pd.to_datetime(df_cars.Date)

The df_cars is expected to be updated to

        RefNo Brand    Model  Plate Driver        Date
0  1708092730  MERC   GLK270  PLT01         2018-02-15   
1  T170900224  MERC     C200  PLT02   Phil  2018-01-15    
2  1708092733   BMW       Χ3  PLT03   Jeff  2018-02-01   
3  T170902781   BMW       X6  PLT04         2018-02-12   
4  T171002176    VW   PASSAT  PLT05                NaT
5  1708090817  FORD  MUSTANG  PLT06                NaT

When the whole dataset (25-30k records) is processed i'm getting the warning

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Are there any suggestions for improving this part of code? Thank you in advance!

Pandas: Updating dataframe column from other dataframes

Answers (1)

Related Questions