SCool
SCool

Reputation: 3375

How to slice multiple sections of dataframe by column name?

How do I get the first column, then add another slice?

For example:

import pandas as pd
df = pd.DataFrame(pd.np.random.rand(6,6), columns = list('abcdef'))

          a         b         c         d         e         f
0  0.147163  0.710360  0.069732  0.180949  0.694066  0.639505
1  0.771643  0.094805  0.371702  0.177538  0.089168  0.420331
2  0.431394  0.790537  0.378049  0.402930  0.350409  0.827950
3  0.421411  0.451595  0.703630  0.469526  0.612122  0.076728
4  0.854117  0.302925  0.664647  0.664098  0.959504  0.637122
5  0.659791  0.525526  0.007151  0.448761  0.738571  0.349142

I am trying to get column a, and all columns after c.

This gives me just columns c,d,e,f:

df.loc[:'a', 'c':]

This doesn't work at all:

df.loc['a':'a', 'c':]

I did a few more attempts but they are just random guessing and I genuinely can't find a solution online.

Note: I am working with a huge real dataframe, so it will be impractical to write individual column names like df.loc[:,['a','c','d','e','f]]

Upvotes: 3

Views: 565

Answers (5)

sammywemmy
sammywemmy

Reputation: 28649

One option for flexible column selection is with select_columns from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.select_columns('a', slice('c', None))

          a         c         d         e         f
0  0.147163  0.069732  0.180949  0.694066  0.639505
1  0.771643  0.371702  0.177538  0.089168  0.420331
2  0.431394  0.378049  0.402930  0.350409  0.827950
3  0.421411  0.703630  0.469526  0.612122  0.076728
4  0.854117  0.664647  0.664098  0.959504  0.637122
5  0.659791  0.007151  0.448761  0.738571  0.349142

Upvotes: 1

Ethan King
Ethan King

Reputation: 151

You can make use of calling dataframes with a list of column names to get the dataframe you want. First, we'll grab a list of all the columns, then just use slicing on the column list, feeding the sliced columns into the dataframe.

df.columns.to_list()
['a', 'b', 'c', 'd', 'e', 'f']

slice

cols = df.columns.to_list()
cols = cols[:1] + cols[2:]
cols
['a', 'c', 'd', 'e', 'f']

call df with cols

df[cols]
          a         c         d         e         f
0  0.749754  0.291974  0.638897  0.768337  0.255553
1  0.541221  0.816086  0.472628  0.276530  0.946075
2  0.811953  0.692716  0.729467  0.512503  0.589812
3  0.613418  0.588730  0.497962  0.122666  0.153101
4  0.600428  0.897041  0.643585  0.382276  0.164303
5  0.165782  0.107455  0.149544  0.309294  0.544864

Upvotes: 2

BENY
BENY

Reputation: 323226

We can do np.r_

df.iloc[:,np.r_[0,2:df.shape[1]]]
Out[99]: 
          a         c         d         e         f
0  0.147163  0.069732  0.180949  0.694066  0.639505
1  0.771643  0.371702  0.177538  0.089168  0.420331
2  0.431394  0.378049  0.402930  0.350409  0.827950
3  0.421411  0.703630  0.469526  0.612122  0.076728
4  0.854117  0.664647  0.664098  0.959504  0.637122
5  0.659791  0.007151  0.448761  0.738571  0.349142

To get the position get_indexer

df.columns.get_indexer(['c'])
Out[100]: array([2], dtype=int64)

Generalized

def drop_from_here_to_there(df, here, there):
    n, m = df.shape
    i, j = df.columns.get_indexer([here, there])
    k = np.r_[0:i+1, j:m]
    return df.iloc[:, k]

drop_from_here_to_there(df, 'a', 'c')

          a         c         d         e         f
0  0.147163  0.069732  0.180949  0.694066  0.639505
1  0.771643  0.371702  0.177538  0.089168  0.420331
2  0.431394  0.378049  0.402930  0.350409  0.827950
3  0.421411  0.703630  0.469526  0.612122  0.076728
4  0.854117  0.664647  0.664098  0.959504  0.637122
5  0.659791  0.007151  0.448761  0.738571  0.349142

Upvotes: 5

piRSquared
piRSquared

Reputation: 294218

drop

df.drop('b', axis=1)

          a         c         d         e         f
0  0.147163  0.069732  0.180949  0.694066  0.639505
1  0.771643  0.371702  0.177538  0.089168  0.420331
2  0.431394  0.378049  0.402930  0.350409  0.827950
3  0.421411  0.703630  0.469526  0.612122  0.076728
4  0.854117  0.664647  0.664098  0.959504  0.637122
5  0.659791  0.007151  0.448761  0.738571  0.349142

Upvotes: 2

jezrael
jezrael

Reputation: 862406

I understand question how possible select by columns names.

Not easy, because first need positions with Index.get_loc and pass it to numpy.r_ with select by DataFrame.iloc:

a = df.columns.get_loc('a')
b = df.columns.get_loc('c')
c = len(df.columns)

df = df.iloc[:, np.r_[a, b:c]]
print (df)
          a         c         d         e         f
0  0.210653  0.218035  0.845753  0.456271  0.279802
1  0.932892  0.909715  0.043418  0.707115  0.483889
2  0.444221  0.040683  0.332754  0.947120  0.617660
3  0.368875  0.206132  0.165066  0.361817  0.863353
4  0.509402  0.950252  0.815966  0.322974  0.972098
5  0.987351  0.655923  0.405653  0.257348  0.082653

Upvotes: 6

Related Questions