Reputation: 51
How to compare columns and extract values in two csv files similar to Excel VLOOKUP?
a.csv
name,type
test1,A
test2,B
test3,A
test4,E
test5,C
test6,D
b.csv
type,value
A,1.0
B,0.5
C,0.75
D,0.25
Expected output after comparison of "type column", create a new csv file with these values
newfile.csv
name,type,value
test1,A,1.0
test2,B,0.5
test3,A,1.0
test4,E,N/A
test5,C,0.75
test6,D,0.25
So far, codes as below
A = 'a.csv'
B = 'b.csv'
df_B = pd.read_csv(B)
with open(A, 'r') as reference:
with open('newfile.csv', 'w') as results:
reader = csv.reader(reference)
writer = csv.writer(results)
writer.writerow(next(reader, []) + ['value'])
for row in reader:
checkRecords = df_B.loc[df_B['type'] == row[1]]
#checkRecords_A = df_B[df_B.type == row[1]].iloc[0] # IndexError: index 0 is out of bounds for axis 0 with size 0
if checkRecords.empty:
value = 'N/A'
else:
value = checkRecords.value
print(value)
# This value have name and dtype which is not expected
writer.writerow(row + [value])
results.close()
Upvotes: 0
Views: 315
Reputation: 585
Using pandas
, you can merge
two DataFrames where one contains relevant information which will be used in the other DataFrame. Here's an example:
import pandas as pd
csv1 = pd.DataFrame({"name":["test1","test2","test3","test4","test5"],"type":["A","B","C","A","D"]})
csv2 = pd.DataFrame({"type":["A","B","C"],"value":[1,2,3]})
pd.merge(csv1, csv2, on="type", how='outer')
And the output would be:
name type value
test1 A 1.0
test4 A 1.0
test2 B 2.0
test3 C 3.0
test5 D NaN
Upvotes: 3