Reputation: 4784
I have a table made of 380 rows and 20 columns. I want to remove rows from this table following a certain condition.
To clarify things, let's say I have the list:
names = ['John', 'Amy', 'Daniel']
I want to remove the data of all the people whose name is found in the list names
.
Example, let's say my data looks something like this:
John 82 3.12 boy
Katy 12 1.12 girl
Amy 42 2.45 girl
Robert 32 1.56 boy
Daniel 47 2.10 boy
I want to remove the data of John
, Amy
, and Daniel
. So the output should be:
Katy 12 1.12 girl
Robert 32 1.56 boy
import csv
import numpy as np
# loading data
data = np.genfromtxt('file.txt', dtype = None)
csvfile = "home/paula/Desktop/test.txt"
with open(csvfile, 'w') as output:
writer = csv.writer(output, delimiter = '\t')
for row in range(len(data)):
if data[row][0] == (i for i in names):
print 'removing the data of', i, '...'
else:
writer.writerow([data[row][0], data[row][1],
data[row][2], data[row][3]])
My code is working, however the data was not deleted from my original data. When I open the new test.txt file, I can see that the data was not deleted.
I am certain that the bug is in if data[row][0] == (i for i in names):
How can I fix this?
Upvotes: 0
Views: 131
Reputation: 32521
The condition should be written:
if data[row][0] in names:
In your current code, (i for i in names)
creates a generator and you are then testing if the string is equal to the generator object, which will be false
>>> (i for i in names)
<generator object <genexpr> at 0x1060564b0>
>>> 'John' == (i for i in names)
False
>>>
Instead, you can test if an item is in a list as follows
>>> names = ['John', 'Amy', 'Daniel']
>>> 'John' in names
True
>>> 'Bob' in names
False
>>>
As mentioned in the comments, you can make this check more efficient by converting names
to a set
before iterating over the rows. But ideally you would use the Pandas library to manipulate csv/table data. See this answer for a similar example. You can negate the condition with df[~df.Name.isin(...)]
.
Upvotes: 4
Reputation: 2480
if data[row][0] == (i for i in names):
print 'removing the data of', i, '...'
in that portion i
is use in (i for i in names)
as a local veriable. But in next print line you use i
. Here you can not use this.
you can use for check as if data[row][0] in names:
. You can try like:
if data[row][0] == names:
print 'removing the data of', data[row][0], '...'
Upvotes: 0
Reputation: 1774
You're checking whether data[row][0]
is the same as (i for i in names)
. What you want to do is check whether it's the same as one of the elements of (i for i in names)
. You could do that this way:
any([data[row][0]==i for i in names])
You could also do it the non-ridiculous way, with the in
operator:
data[row][0] in names
This checks whether any of the elements of names
is the same as data[row][0]
.
Upvotes: 0