Little
Little

Reputation: 3477

Selecting a range of columns in a dataframe

I have a dataset that consists of columns 0 to 10, and I would like to extract the information that is only in columns 1 to 5, not 6, and 7 to 9 (it means not the last column). So far, I have done the following:

 A = B[:, [[1:5], [7:-1]]]

but I got a syntax error, how can I retrieve that data?

Upvotes: 6

Views: 4751

Answers (5)

Gonçalo Peres
Gonçalo Peres

Reputation: 13582

One can solve that with the sum of range

[In]: columns = list(range(1,6)) + list(range(7,10))
[Out]: 
[1, 2, 3, 4, 5, 7, 8, 9]

Then, considering that your df is called df, using iloc to select the DF columns

newdf = df.iloc[:, columns]

Upvotes: 1

GSA
GSA

Reputation: 793

Just to add some of my thoughts. There are two approaches one can take using either numpy or pandas. So I will demonstrate with some data, and assume that the data is the grades for a student in different courses he/she is enrolled in.

import pandas as pd
import numpy as np

data = {'Course A': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
        'Course B': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
        'Course C': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78],
        'Course D': [84, 82, 81, 89, 73, 94, 92, 70, 88, 95],
        'Course E': [85, 82, 72, 77, 75, 89, 95, 84, 77, 94],
        'Course F': [97, 94, 93, 95, 88, 82, 78, 84, 69, 78]
        }

df = pd.DataFrame(data=data)
df.head()


    CA  CB  CC  CD  CE  CF
0   84  85  97  84  85  97
1   82  82  94  82  82  94
2   81  72  93  81  72  93
3   89  77  95  89  77  95
4   73  75  88  73  75  88

NOTE: CA through CF represent Course A through Course F.

To help us remember column names and their associated indexes, we can build a list of columns and their indexes via list comprehension.

map_cols = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]

['0:Course A',
 '1:Course B',
 '2:Course C',
 '3:Course D',
 '4:Course E',
 '5:Course F']

Now, to select say Course A, and Course D through Course F using indexing in numpy, you can do the following:

df.iloc[:, np.r_[0, 3:df.shape[1]]]

    CA  CD  CE  CF
0   84  84  85  97
1   82  82  82  94
2   81  81  72  93
3   89  89  77  95
4   73  73  75  88

You can also use pandas to the same effect.

df[[df.columns[0], *df.columns[3:]]]

    CA  CD  CE  CF
0   84  84  85  97
1   82  82  82  94
2   81  81  72  93
3   89  89  77  95
4   73  73  75  88

Upvotes: 2

sacuL
sacuL

Reputation: 51335

Another way would be to get your slices independently, and then concatenate:

A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)

Using similar example data as @jpp:

B = np.random.randint(0, 10, (3, 10))

>>> B
array([[0, 5, 0, 6, 8, 5, 9, 3, 2, 0],
       [8, 8, 1, 7, 3, 5, 7, 7, 4, 8],
       [5, 5, 5, 2, 3, 1, 6, 4, 9, 6]])

A = np.concatenate([B[:, 1:6], B[:, 7:-1]], axis=1)

>>> A
array([[5, 0, 6, 8, 5, 3, 2],
       [8, 1, 7, 3, 5, 7, 4],
       [5, 5, 2, 3, 1, 4, 9]])

Upvotes: 2

jpp
jpp

Reputation: 164613

Advanced indexing doesn't take a list of lists of slices. Instead, you can use numpy.r_. This function doesn't take negative indices, but you can get round this by using np.ndarray.shape:

A = B[:, np.r_[1:6, 7:B.shape[1]-1]]

Remember to add 1 to the second part, since a: b does not include b, in the same way slice(a, b) does not include b. Also note that indexing begins at 0.

Here's a demo:

import numpy as np

B = np.random.randint(0, 10, (3, 11))

print(B)

[[5 8 8 8 3 0 7 2 1 6 7]
 [4 3 8 7 3 7 5 6 0 5 7]
 [1 0 4 0 2 2 5 1 4 2 3]]

A = B[:,np.r_[1:6, 7:B.shape[1]-1]]

print(A)

[[8 8 8 3 0 2 1 6]
 [3 8 7 3 7 6 0 5]
 [0 4 0 2 2 1 4 2]]

Upvotes: 5

Junhee Shin
Junhee Shin

Reputation: 758

how about union the range?

B[:, np.union1d(range(1,6), range(7,10))]

Upvotes: 1

Related Questions