Reputation: 83
I have 2 CSV files.
File1.csv
Frame_Nr; Data1; Data2; Labeled
0 0 1 1
1 0 0 1
2 1 1 1
3 0 0 0
4 0 0 0
5 1 0 1
6 0 0 0
7 0 0 0
11 0 1 1
12 1 1 1
File2.csv
Frame_Nr; Data1; Data2; Labeled
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
I want the output to look something like this. And should merge file2.csv with file file1.csv and if there are some changes to replace with data from file1.csv else to keep data from file2.csv
Expected output.csv
Frame_Nr; Data1; Data2; Labeled
0 0 1 1
1 0 0 1
2 1 1 1
3 0 0 0
4 0 0 0
5 1 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 0
10 0 0 0
11 0 1 1
12 1 1 1
My code :
import csv
import os
f = open('file2', 'r')
reader = csv.reader(f, delimiter=';')
reader = list(reader)
f1 = open('file1', 'r')
reader1 = csv.reader(f1, delimiter=';')
next(reader1)
reader1 = list(reader1)
for line1 in reader1:
for line in reader:
if line1[0] != line[0]:
print(line1)
else:
print(line)
Upvotes: 1
Views: 95
Reputation: 2720
Pandas has two very nice functions to help you avoid a nested for loop and make the process more efficient:
import pandas as pd
df1 = pd.read_csv('file1.csv', options='whatever makes your csvs load')
df2 = pd.read_csv('file2.csv', options='whatever makes your csvs load')
df = pd.concat([df1, df2]).drop_duplicates('Frame_Nr')
Optionally, if you want the resulting DataFrame
sorted by Frame_Nr
, chain a .sort_values('Frame_Nr')
to the last line
To explain the code snippet: pd.concat
concatenates both DataFrames so that you first have all rows from file 1 and after that all rows from file 2, the drop_duplicates
after that removes all rows with duplicate values in Frame_Nr
, keeping the first. Since file1
was the first file in the concatenation, all lines from that file are kept and lines from file2
are only retained if they have a frame number that was not in file1
. Optionally, the sort_values
will sort the DataFrame by the frame number column
Upvotes: 2
Reputation: 33147
import pandas as pd
df1 = pd.read_csv("file1.csv", delim_whitespace=True)
df2 = pd.read_csv("file2.csv", delim_whitespace=True)
df=pd.concat([df1, df2]).drop_duplicates('Frame_Nr;').sort_values("Frame_Nr;")
Upvotes: 2