Reputation: 2981
I have 2 dataframes, one with a list of systems and versions and another with a list of all the systems/versions and whether they are obsolete, and I'm struggling to perform some sort of multi-if statement (including comparators) and populate the data in the first df with the data from the second.
The dataframes are;
>>> system_df
ID SIS_TYPE EDITION VERSION
0 SYS1 AMA first 2
1 SYS2 DIR first 10.3
2 SYS3 DIR second 9
3 SYS4 ORA first 7
>>> obsolete_df
SIS_TYPE EDITION VERSION OBSOLETE
0 AMA first 2 YES
1 DIR first 10 NO
2 DIR * 9.1 NO
3 ORA third 7 YES
4 ORA * 9 NO
5 ORA * 8 YES
What I'm trying to do is the following;
First I need to lookup the SIS_TYPE
in obsolet_df, then I want to check the EDITION
for a match, but if there is not match in EDITION
it should match with the asterisk. After finding a match with both these columns I then want to check the VERSION
column for one of these three things;
Final step would be to add the OBSOLETE
value into the system_df if all of these conditions are satisfied.
So the desired result would be something like;
>>> final_df
ID SIS_TYPE EDITION VERSION OBSOLETE
0 SYS1 AMA first 2 YES
1 SYS2 DIR first 10 NO
2 SYS3 DIR * 9.1 NO
3 SYS4 ORA third 8 YES
I've tried many combinations of .loc
, merge
and join
but I cannot get the dataframe how I need it.
EDIT: I've tried with the following however it's too slow for the quantity of entries in the real dataframe:
l = []
for s_row in system_df.itertuples():
for o_row in obsolete_df.loc[obsolete_df['SUBSYSTEM_TYPE'] == s_row[26]].itertuples():
if s_row[28] == o_row[2]:
l.append((s_row[28], s_row[2]))
elif o_row[2] == '*':
l.append((s_row[28], s_row[2]))
Upvotes: 0
Views: 282
Reputation: 2981
Expanding on greg's answer, looping through the rows of both tables and directly comparing the values seems to work fine. Final code used;
#Iterating through each row in system_df
for s_row in system_df.itertuples():
#Iterating through each row in obsolete_df if the SIS_TYPE matches
for o_row in obsolete_df.loc[obsolete_df['SIS_TYPE'] == s_row[2]].itertuples():
#Comparing Editions and Versions
if s_row[3] == o_row[2] and s_row[4] == o_row[3]:
#Updating the system_df with new values if match
system_df.at[s_row.Index, 'OBSOLETE'] = o_row[4]
#Else if use'*' for Edition and compare Versions
elif o_row[2] == '*' and s_row[4] == o_row[3]:
system_df.at[s_row.Index, 'OBSOLETE'] = o_row[4]
Upvotes: 0
Reputation: 1416
I think it should be like:
l = []
for i in range(len(obsolete_df)):
s_row = system_df[i]
o_row = obsolete_df[i]
if s_row[2] == o_row[1]: # compare SIS_TYPE
if s_row[2] == o_row[1] or o_row[1] == '*': # compare EDITION
l.append((s_row[1], s_row[2], ...))
It hasn't all needed validations, but it will be easy to add them. Sorry, I didn't test it!
Upvotes: 1