Reputation: 21
I need to compare multiple columns in two separate csv files and when the first 4 columns all match between the two csv files, make a new csv file with the matching four columns and then add the 5th and 7th column of the second csv file to it as well as parse the first csv file column 8 to any of the columns in second csv file. if they match then take the subsequent columns value. I know this sounds confusing but below shows what I am trying to do
First.csv
Address ID X Y col5 col6 col7 col8 col9
Bob 16 1 2 x x x ABC #1
Jon 6 5 7 x x x ABC #2
Rem 16 4 9 x x x ABC #3
Bob 16 2 3 x x x ABC #4
Second.csv
Address ID X Y col5 col6 col7 col8 col9 col10 col11 col12 col13 col14 col15
Bob 16 1 2 x x x def # ABC #5 ghi # jkl #
Bob 16 1 3 x x x def # ABC # ghi # jkl #
Jon 6 5 7 x x x def # ghi # ABC #6 jkl #
Jon 6 5 8 x x x def # ghi # ABC # jkl #
Rem 16 4 9 x x x def # ghi # jkl # ABC #7
Rem 16 4 8 x x x def # ghi # jkl # ABC #
Bob 16 2 3 x x x ABC #8 def # ghi # jkl #
Bob 16 2 6 x x x ABC # def # ghi # jkl #
output.csv
Address ID X Y 1 2 3 4 5 6
Bob 16 1 2 x x x ABC #1 #5
Jon 6 5 7 x x x ABC #2 #6
Rem 16 4 9 x x x ABC #3 #7
Bob 16 2 3 x x x ABC #4 #8
I have been trying to use conditional statements like:
pathFirst = "C:\First.csv"
pathFinal = "C:\Second.csv"
dataFirst = pd.read_csv(pathFirst)
dataFinal = pd.read_csv(pathFinal)
Output = pd.DataFrame(columns=['status'], index=dataFirst.index)
Output['status'] = (dataFirst['Address'] == dataFinal['address'] & dataFirst['ID'] == dataFinal['ID']).replace([True, False], ['Matching', 'Not Matching'])
Output.to_csv('C:\output.csv')
Just to see if there is a line that is actually comparing correctly but seem like there is just a couple. I know I am doing this wrong and I hope this is explained well enough so I really appreciate any help. Thank you.
EDIT: I only want the first four columns to match. Address, ID, X, Y after that I want to match column 4 string value in First.csv file with the same string value in columns 4,6 8 or 10 in Second.csv. If they match then take the #value from the next column. So if in Second.csv row 1 column 6 matches the string from the First.csv file then I want Second.csv row 1 column 7s #value.
Upvotes: 1
Views: 853
Reputation: 9061
Try this
df1 = pd.read_csv('first.csv')
df2 = pd.read_csv('second.csv')
res = df1.merge(df2.iloc[:, :4], on=['Address', 'ID', 'X', 'Y'])
val = [np.where(df2[str(n)].isin(df1['4']), df2[str(n + 1)], np.nan) for n in range(4, 11, 2)]
res['6'] = pd.DataFrame(val).replace('#', np.nan).T.dropna(how='all').bfill(axis=1)[0].to_list()
print(res)
Output:
Address ID X Y 1 2 3 4 5 6
0 Bob 16 1 2 x x x ABC #1 #5
1 Jon 6 5 7 x x x ABC #2 #6
2 Rem 16 4 9 x x x ABC #3 #7
3 Bob 16 2 3 x x x ABC #4 #8
Upvotes: 1