Rob
Rob

Reputation: 428

How to Merge two Dataframe using matching indexes?

So I have two DataFrames: Historic and Applet. Historic contains a list of all courses my school offered in the past and Applet is all courses that my school currently offers

I want to merge the two DataFrames so that any items in my Applet DataFrame that don't exist in Historic are added and any that do exist overwrite their copies in Historic (Some courses may have updated information and should overwrite their historic entries with that information..)

I'm currently using Historic.combine_first(Applet) to merge the two by on their Indexes. However, I want the duplicate entries to overwrite their Historic entries not just make a duplicate entry.

Code:

    def update2(self):
        historic = pd.read_csv('course_history.txt', header=None, sep='"', encoding = 'ISO-8859-1',
                               names=['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'],index_col=[0,6,7])
        winnet = pd.DataFrame(self.data, columns =['Course_ID', 'Course_Title', 'Professor_Name','Meeting_Time','Enrollment','Room','Year','Term','Credit'] )
        winnet.set_index(['Course_ID','Year','Term'], inplace=True)
        historic3 = historic.combine_first(winnet)

Historic DataFrame:

Course_ID   Year    Term                                                                           ...
AC 230 01   2020-21 May                         Accounting Systems              Crouse, Justin D.  ...    ROOM NULL   1.00
AC 429 01   2020-21 May                                 CPA Review              Sommermeyer, Eric  ...    ROOM NULL   1.00
ART 150 01  2020-21 May          20th-Century Art, Media, & Design            Fedeler, Barbara J.  ...    ROOM NULL   1.00
ART 208 01  2020-21 May                              Photography I               Payne, Thomas R.  ...    ROOM NULL   1.00
PSY 222 01  2018-19 FA                 Cognitive Psychology                   Eslick Watkins, A    ...    ROOM NULL   1.00

Applet DataFrame:

Course_ID       Year    Term
PSY  101  01    2018-19 FA           Introduction to Psychology            Bane, C    T H  9:35AM-11:15AM    40/44/0  LH   330     1.00
PSY  101  02    2018-19 FA           Introduction to Psychology  Eslick Watkins, A     T H  1:00PM-2:40PM    40/43/0  SC   134     1.00
PSY  210  10    2018-19 FA                    Child Development           Munir, S    T H  9:35AM-11:15AM    30/10/0  LH   327     0.50
PSY  211  20    2018-19 FA               Adolescent Development           Munir, S     T H  1:00PM-2:40PM     30/6/0  LH   330     0.50
PSY  222  01    2018-19 FA                 Cognitive Psychology  Eslick Watkins, A    T H  9:35AM-11:15AM    30/24/0  LH   324     1.00

Upvotes: 0

Views: 32

Answers (1)

Eric Truett
Eric Truett

Reputation: 3010

You can use concat then drop_duplicates

cols = [columns_to_judge_duplicates]
combined = pd.concat([Applet, Historic])
combined = combined.drop_duplicates(subset=cols, method='first')

Upvotes: 1

Related Questions