Reputation: 4774
I am reading a file with pd.read_csv
and removing all the values that are -1
. Here's the code
import pandas as pd
import numpy as np
columns = ['A', 'B', 'C', 'D']
catalog = pd.read_csv('data.txt', sep='\s+', names=columns, skiprows=1)
a = cataog['A']
b = cataog['B']
c = cataog['C']
d = cataog['D']
print len(b) # answer is 700
# remove rows that are -1 in column b
idx = np.where(b != -1)[0]
a = a[idx]
b = b[idx]
c = c[idx]
d = d[idx]
print len(b) # answer is 612
So I am assuming that I have successfully managed to remove all the rows where the value in column b is -1.
In order to test this, I am doing the following naive way:
for i in range(len(b)):
print i, a[i], b[i]
It prints out the values until it reaches a row which was supposedly filtered out. But now it gives a KeyError
.
Upvotes: 2
Views: 2118
Reputation: 76297
If you filter out indices, then
for i in range(len(b)):
print i, a[i], b[i]
will attempt to access erased indices. Instead, you can use the following:
for i, ae, be in zip(a.index, a.values, b.values):
print(i, ae, be)
Upvotes: 1
Reputation: 862481
You can filtering by boolean indexing
:
catalog = catalog[catalog['B'] != -1]
a = cataog['A']
b = cataog['B']
c = cataog['C']
d = cataog['D']
It is expected you get KeyError
, because index values not match, because filtering.
One possible solution is convert Series
to list
s:
for i in range(len(b)):
print i, list(a)[i], list(b)[i]
Sample:
catalog = pd.DataFrame({'A':list('abcdef'),
'B':[-1,5,4,5,-1,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0]})
print (catalog)
A B C D
0 a -1 7 1
1 b 5 8 3
2 c 4 9 5
3 d 5 4 7
4 e -1 2 1
#filtered DataFrame have no index 0, 4
catalog = catalog[catalog['B'] != -1]
print (catalog)
A B C D
1 b 5 8 3
2 c 4 9 5
3 d 5 4 7
5 f 4 3 0
a = catalog['A']
b = catalog['B']
c = catalog['C']
d = catalog['D']
print (b)
1 5
2 4
3 5
5 4
Name: B, dtype: int64
#a[i] in first loop want match index value 0 (a[0]) what does not exist, so KeyError,
#same problem for b[0]
for i in range(len(b)):
print (i, a[i], b[i])
KeyError: 0
#convert Series to list, so list(a)[0] return first value of list - there is no Series index
for i in range(len(b)):
print (i, list(a)[i], list(b)[i])
0 b 5
1 c 4
2 d 5
3 f 4
Another solution should be create default index 0,1,...
by reset_index
with drop=True
:
catalog = catalog[catalog['B'] != -1].reset_index(drop=True)
print (catalog)
A B C D
0 b 5 8 3
1 c 4 9 5
2 d 5 4 7
3 f 4 3 0
a = catalog['A']
b = catalog['B']
c = catalog['C']
d = catalog['D']
#default index values match a[0] and a[b]
for i in range(len(b)):
print (i, a[i], b[i])
0 b 5
1 c 4
2 d 5
3 f 4
Upvotes: 2