Nordle
Nordle

Reputation: 2981

Multiple IF statements comparing 2 dataframes and populating dataframe based on result

I have 2 dataframes, one with a list of systems and versions and another with a list of all the systems/versions and whether they are obsolete, and I'm struggling to perform some sort of multi-if statement (including comparators) and populate the data in the first df with the data from the second.

The dataframes are;

>>> system_df
    ID    SIS_TYPE   EDITION   VERSION
0   SYS1      AMA     first       2
1   SYS2      DIR     first       10.3
2   SYS3      DIR     second      9
3   SYS4      ORA     first       7




>>> obsolete_df
    SIS_TYPE   EDITION   VERSION    OBSOLETE
0      AMA     first      2          YES
1      DIR     first      10         NO
2      DIR       *        9.1        NO
3      ORA     third      7          YES
4      ORA       *        9          NO
5      ORA       *        8          YES

What I'm trying to do is the following;

First I need to lookup the SIS_TYPE in obsolet_df, then I want to check the EDITION for a match, but if there is not match in EDITION it should match with the asterisk. After finding a match with both these columns I then want to check the VERSION column for one of these three things;

Final step would be to add the OBSOLETE value into the system_df if all of these conditions are satisfied.

So the desired result would be something like;

>>> final_df
    ID    SIS_TYPE   EDITION   VERSION    OBSOLETE
0   SYS1      AMA     first      2          YES
1   SYS2      DIR     first      10         NO
2   SYS3      DIR       *        9.1        NO
3   SYS4      ORA     third      8          YES

I've tried many combinations of .loc, merge and join but I cannot get the dataframe how I need it.

EDIT: I've tried with the following however it's too slow for the quantity of entries in the real dataframe:

        l = []
        for s_row in system_df.itertuples():
            for o_row in obsolete_df.loc[obsolete_df['SUBSYSTEM_TYPE'] == s_row[26]].itertuples():
                if s_row[28] == o_row[2]:
                    l.append((s_row[28], s_row[2]))
                elif o_row[2] == '*':
                    l.append((s_row[28], s_row[2]))

Upvotes: 0

Views: 282

Answers (2)

Nordle
Nordle

Reputation: 2981

Expanding on greg's answer, looping through the rows of both tables and directly comparing the values seems to work fine. Final code used;

#Iterating through each row in system_df
for s_row in system_df.itertuples():
        #Iterating through each row in obsolete_df if the SIS_TYPE matches
        for o_row in obsolete_df.loc[obsolete_df['SIS_TYPE'] == s_row[2]].itertuples():

            #Comparing Editions and Versions
            if s_row[3] == o_row[2] and s_row[4] == o_row[3]:
                #Updating the system_df with new values if match
                system_df.at[s_row.Index, 'OBSOLETE'] = o_row[4]

            #Else if use'*' for Edition and compare Versions
            elif o_row[2] == '*' and s_row[4] == o_row[3]:
                system_df.at[s_row.Index, 'OBSOLETE'] = o_row[4]

Upvotes: 0

greg
greg

Reputation: 1416

I think it should be like:

l = []
for i in range(len(obsolete_df)):
    s_row = system_df[i]
    o_row = obsolete_df[i]
    if s_row[2] == o_row[1]: # compare SIS_TYPE
        if s_row[2] == o_row[1] or o_row[1] == '*': # compare EDITION
            l.append((s_row[1], s_row[2], ...))

It hasn't all needed validations, but it will be easy to add them. Sorry, I didn't test it!

Upvotes: 1

Related Questions