Reputation: 3477
I have a dataset that consists of columns 0 to 10, and I would like to extract the information that is only in columns 1 to 5, not 6, and 7 to 9 (it means not the last column). So far, I have done the following:
A = B[:, [[1:5], [7:-1]]]
but I got a syntax error
, how can I retrieve that data?
Upvotes: 6
Views: 4751
Reputation: 13582
One can solve that with the sum of range
[In]: columns = list(range(1,6)) + list(range(7,10))
[Out]:
[1, 2, 3, 4, 5, 7, 8, 9]
Then, considering that your df is called df, using iloc
to select the DF columns
newdf = df.iloc[:, columns]
Upvotes: 1
Reputation: 793
Just to add some of my thoughts. There are two approaches one can take using either numpy or pandas. So I will demonstrate with some data, and assume that the data is the grades for a student in different courses he/she is enrolled in.
import pandas as pd
import numpy as np
data = {'Course A': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course B': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course C': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78],
'Course D': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
'Course E': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
'Course F': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78]
}
df = pd.DataFrame(data=data)
df.head()
CA CB CC CD CE CF
0 84 85 97 84 85 97
1 82 82 94 82 82 94
2 81 72 93 81 72 93
3 89 77 95 89 77 95
4 73 75 88 73 75 88
NOTE: CA
through CF
represent Course A
through Course F.
To help us remember column names and their associated indexes, we can build a list of columns and their indexes via list comprehension.
map_cols = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]
['0:Course A',
'1:Course B',
'2:Course C',
'3:Course D',
'4:Course E',
'5:Course F']
Now, to select say Course A
, and Course D
through Course F
using indexing in numpy, you can do the following:
df.iloc[:, np.r_[0, 3:df.shape[1]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
You can also use pandas to the same effect.
df[[df.columns[0], *df.columns[3:]]]
CA CD CE CF
0 84 84 85 97
1 82 82 82 94
2 81 81 72 93
3 89 89 77 95
4 73 73 75 88
Upvotes: 2
Reputation: 51335
Another way would be to get your slices independently, and then concatenate:
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
Using similar example data as @jpp:
B = np.random.randint(0, 10, (3, 10))
>>> B
array([[0, 5, 0, 6, 8, 5, 9, 3, 2, 0],
[8, 8, 1, 7, 3, 5, 7, 7, 4, 8],
[5, 5, 5, 2, 3, 1, 6, 4, 9, 6]])
A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)
>>> A
array([[5, 0, 6, 8, 5, 3, 2],
[8, 1, 7, 3, 5, 7, 4],
[5, 5, 2, 3, 1, 4, 9]])
Upvotes: 2
Reputation: 164613
Advanced indexing doesn't take a list of lists of slices. Instead, you can use numpy.r_
. This function doesn't take negative indices, but you can get round this by using np.ndarray.shape
:
A = B[:, np.r_[1:6, 7:B.shape[1]-1]]
Remember to add 1 to the second part, since a: b
does not include b
, in the same way slice(a, b)
does not include b
. Also note that indexing begins at 0.
Here's a demo:
import numpy as np
B = np.random.randint(0, 10, (3, 11))
print(B)
[[5 8 8 8 3 0 7 2 1 6 7]
[4 3 8 7 3 7 5 6 0 5 7]
[1 0 4 0 2 2 5 1 4 2 3]]
A = B[:,np.r_[1:6, 7:B.shape[1]-1]]
print(A)
[[8 8 8 3 0 2 1 6]
[3 8 7 3 7 6 0 5]
[0 4 0 2 2 1 4 2]]
Upvotes: 5
Reputation: 758
how about union the range?
B[:, np.union1d(range(1,6), range(7,10))]
Upvotes: 1