amadispstac
amadispstac

Reputation: 687

Append missing values to CSV file

I have a sorted CSV file in following format-

X,Y
0,0
0,1
0,2
1,0
1,1
2,0
2,1
2,1

Here, a value 1,2 is absent. This is just a sample, my file contains a 1 million records with a few thousand absent. How can I write a script to detect and append these values to the file?

I have tried generating all possible pairs and check if they are present in the file or not, but is way too slow-

import csv

with open('myfile.csv') as csvfile:
r = csv.reader(csvfile, delimiter=',')

for row in r:

    for i in range(1000):
        for j in range(1000):
            if (int(row[0]) == i and int(row[1]) == j):
                # Can perform operations here

Is there some way I can use Numpy or Pandas (I'm very new to those) to solve this problem?

Upvotes: 2

Views: 191

Answers (1)

Scott Boston
Scott Boston

Reputation: 153500

One way using sets:

from intertools import product
import pandas as pd

df1 = pd.read_csv('myfile.csv')

set(product(df1.X.unique(), df1.Y.unique())).difference(set((i[1], i[2]) for i in df1.itertuples()))

Output:

{(1, 2), (2, 2)}

Upvotes: 3

Related Questions