Reputation: 200
This is how my data looks:
print(len(y_train),len(index_1))
index_1 = pd.DataFrame(data=index_1)
print("y_train: ")
print(y_train)
print("index_1: ")
print(index_1)
Output:
1348 555
y_train:
1677 1
1519 0
1114 0
690 1
1012 1
..
1893 1
1844 0
1027 1
1649 1
1789 1
Name: Team 1 Win, Length: 1348, dtype: int64
index_1:
0
0 0
1 2
2 6
3 7
4 8
.. ...
550 1335
551 1341
552 1342
553 1344
554 1346
I want to remove a number of rows (index_1) from a panda dataframe (y_train). So the values in the index_1 df are the rows I want to remove. Problem is that the dataframe is not in order, so when index_1's first item is 0, I want it to remove the first row in y_train (i.e. index 1677), instead of the row with index 0. This is my attempt:
y_train_short = y_train.drop(index_1)
This is what I get:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-57-49f2cce7bac0> in <module>
22 print(index_1)
23 print(index_1)
---> 24 y_train_short = y_train.drop(index_1)
25
26
~/miniconda3/lib/python3.7/site-packages/pandas/core/series.py in drop(self, labels, axis, index, columns, level, inplace, errors)
4137 level=level,
4138 inplace=inplace,
-> 4139 errors=errors,
4140 )
4141
~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3934 for axis, labels in axes.items():
3935 if labels is not None:
-> 3936 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
3937
3938 if inplace:
~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors)
3968 new_axis = axis.drop(labels, level=level, errors=errors)
3969 else:
-> 3970 new_axis = axis.drop(labels, errors=errors)
3971 result = self.reindex(**{axis_name: new_axis})
3972
~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
5016 if mask.any():
5017 if errors != "ignore":
-> 5018 raise KeyError(f"{labels[mask]} not found in axis")
5019 indexer = indexer[~mask]
5020 return self.delete(indexer)
KeyError: '[0] not found in axis'
Independently of the fact that index 0 doesn't exist in y_train, I imagine that if it did, it would not do what I want it to do. So how do I remove the right rows from this dataframe?
Upvotes: 0
Views: 195
Reputation: 31011
Note that y_train.iloc[index_1[0]]
retrieves rows from y_train
taking indicated integer positions.
When you run y_train.iloc[index_1[0]].index
, you will get
indices of these rows.
So do drop these rows, you can run:
y_train.drop(y_train.iloc[index_1[0]].index, inplace=True)
Upvotes: 1