Reputation: 687
I have a sorted CSV file in following format-
X,Y
0,0
0,1
0,2
1,0
1,1
2,0
2,1
2,1
Here, a value 1,2
is absent. This is just a sample, my file contains a 1 million records with a few thousand absent. How can I write a script to detect and append these values to the file?
I have tried generating all possible pairs and check if they are present in the file or not, but is way too slow-
import csv
with open('myfile.csv') as csvfile:
r = csv.reader(csvfile, delimiter=',')
for row in r:
for i in range(1000):
for j in range(1000):
if (int(row[0]) == i and int(row[1]) == j):
# Can perform operations here
Is there some way I can use Numpy or Pandas (I'm very new to those) to solve this problem?
Upvotes: 2
Views: 191
Reputation: 153500
One way using sets:
from intertools import product
import pandas as pd
df1 = pd.read_csv('myfile.csv')
set(product(df1.X.unique(), df1.Y.unique())).difference(set((i[1], i[2]) for i in df1.itertuples()))
Output:
{(1, 2), (2, 2)}
Upvotes: 3