Reputation: 1571
I am dealing with a CSV file to analysis lecture feedback data, the format is like
"5631","18650","10",,,"2015-09-18 09:35:11"
"18650","null","10",,,"2015-09-18 09:37:12"
"18650","5631","10",,,"2015-09-18 09:37:19"
"58649","null","6",,,"2015-09-18 09:38:13"
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"
I am trying to get rid of bad data. Only the data entries with "id1","id2" AND another corresponding "id2","id1" are considered valid.
I am using nested loops to try find a matching entry for each row. However, the outer loop seems to stop half way for no reason. Here's my code
class Filter:
file1 = open('EncodedPeerInteractions.FA2015.csv')
peerinter = csv.reader(file1,delimiter=',')
def __init__(self):
super()
def filter(self):
file2 = open('FilteredInteractions.csv','a')
for row in self.peerinter:
print(row)
if row[0] == 'null' or row[1] == 'null':
continue
id1 = int(row[0])
id2 = int(row[1])
for test in self.peerinter:
if test[0] == 'null' or test[1] == 'null':
continue
if int(test[0]) == id2 and int(test[1]) == id1:
file2.write("\n")
file2.write(str(row))
break
file2.close()
I have tried to use pdb to step trough the code, everything was fine for the first couple loops and then just suddenly jump to file2.close() and return. The program do prints out a few valid entries but is way not enough.
I tested the csv file and its loaded into memory properly with over 18000 entries. I tested using print but it gives the same result so its nothing wrong with the append file.
Edit
Now I understand what the problem is. As this question says, I break out when there's a match but when there's no match, the inner loop will consume all the file without resetting it. When it return to the outer loop it simply ends. I should make it into a list or let it reset.
Upvotes: 1
Views: 858
Reputation: 3931
Try doing something like the following:
def filter(file1, file2):
with open(file1, 'r') as f1:
peerinter = csv.reader(file1,delimiter=',')
with open(file2, 'a') as f2:
for row in peerinter:
...
Using the with open()
syntax wraps it in a context manager, which will ensure that the file is closed properly at the end. I'm guessing that your problem stems from the fact you are opening one file as a class variable, and the other inside the method.
Upvotes: 0
Reputation: 103764
You are making this way more complicated that it needs to be.
Given:
$ cat /tmp/so.csv
"5631","18650","10",,,"2015-09-18 09:35:11"
"18650","null","10",,,"2015-09-18 09:37:12"
"18650","5631","10",,,"2015-09-18 09:37:19"
"58649","null","6",,,"2015-09-18 09:38:13"
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"
You can use csv and filter to get what you want:
>>> with open('/tmp/so.csv') as f:
... list(filter(lambda row: 'null' not in row[0:2], csv.reader(f)))
...
[['5631', '18650', '10', '', '', '2015-09-18 09:35:11'],
['18650', '5631', '10', '', '', '2015-09-18 09:37:19'],
['45379', '31541', '10', 'its friday', 'nothing yet keep it up', '2015-09-18 09:39:46']]
Upvotes: 1