Reputation: 3375
How do I get the first column, then add another slice?
For example:
import pandas as pd
df = pd.DataFrame(pd.np.random.rand(6,6), columns = list('abcdef'))
a b c d e f
0 0.147163 0.710360 0.069732 0.180949 0.694066 0.639505
1 0.771643 0.094805 0.371702 0.177538 0.089168 0.420331
2 0.431394 0.790537 0.378049 0.402930 0.350409 0.827950
3 0.421411 0.451595 0.703630 0.469526 0.612122 0.076728
4 0.854117 0.302925 0.664647 0.664098 0.959504 0.637122
5 0.659791 0.525526 0.007151 0.448761 0.738571 0.349142
I am trying to get column a
, and all columns after c
.
This gives me just columns c,d,e,f
:
df.loc[:'a', 'c':]
This doesn't work at all:
df.loc['a':'a', 'c':]
I did a few more attempts but they are just random guessing and I genuinely can't find a solution online.
Note: I am working with a huge real dataframe, so it will be impractical to write individual column names like df.loc[:,['a','c','d','e','f]]
Upvotes: 3
Views: 565
Reputation: 28649
One option for flexible column selection is with select_columns from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.select_columns('a', slice('c', None))
a c d e f
0 0.147163 0.069732 0.180949 0.694066 0.639505
1 0.771643 0.371702 0.177538 0.089168 0.420331
2 0.431394 0.378049 0.402930 0.350409 0.827950
3 0.421411 0.703630 0.469526 0.612122 0.076728
4 0.854117 0.664647 0.664098 0.959504 0.637122
5 0.659791 0.007151 0.448761 0.738571 0.349142
Upvotes: 1
Reputation: 151
You can make use of calling dataframes with a list of column names to get the dataframe you want. First, we'll grab a list of all the columns, then just use slicing on the column list, feeding the sliced columns into the dataframe.
df.columns.to_list()
['a', 'b', 'c', 'd', 'e', 'f']
slice
cols = df.columns.to_list()
cols = cols[:1] + cols[2:]
cols
['a', 'c', 'd', 'e', 'f']
call df with cols
df[cols]
a c d e f
0 0.749754 0.291974 0.638897 0.768337 0.255553
1 0.541221 0.816086 0.472628 0.276530 0.946075
2 0.811953 0.692716 0.729467 0.512503 0.589812
3 0.613418 0.588730 0.497962 0.122666 0.153101
4 0.600428 0.897041 0.643585 0.382276 0.164303
5 0.165782 0.107455 0.149544 0.309294 0.544864
Upvotes: 2
Reputation: 323226
We can do np.r_
df.iloc[:,np.r_[0,2:df.shape[1]]]
Out[99]:
a c d e f
0 0.147163 0.069732 0.180949 0.694066 0.639505
1 0.771643 0.371702 0.177538 0.089168 0.420331
2 0.431394 0.378049 0.402930 0.350409 0.827950
3 0.421411 0.703630 0.469526 0.612122 0.076728
4 0.854117 0.664647 0.664098 0.959504 0.637122
5 0.659791 0.007151 0.448761 0.738571 0.349142
To get the position get_indexer
df.columns.get_indexer(['c'])
Out[100]: array([2], dtype=int64)
def drop_from_here_to_there(df, here, there):
n, m = df.shape
i, j = df.columns.get_indexer([here, there])
k = np.r_[0:i+1, j:m]
return df.iloc[:, k]
drop_from_here_to_there(df, 'a', 'c')
a c d e f
0 0.147163 0.069732 0.180949 0.694066 0.639505
1 0.771643 0.371702 0.177538 0.089168 0.420331
2 0.431394 0.378049 0.402930 0.350409 0.827950
3 0.421411 0.703630 0.469526 0.612122 0.076728
4 0.854117 0.664647 0.664098 0.959504 0.637122
5 0.659791 0.007151 0.448761 0.738571 0.349142
Upvotes: 5
Reputation: 294218
drop
df.drop('b', axis=1)
a c d e f
0 0.147163 0.069732 0.180949 0.694066 0.639505
1 0.771643 0.371702 0.177538 0.089168 0.420331
2 0.431394 0.378049 0.402930 0.350409 0.827950
3 0.421411 0.703630 0.469526 0.612122 0.076728
4 0.854117 0.664647 0.664098 0.959504 0.637122
5 0.659791 0.007151 0.448761 0.738571 0.349142
Upvotes: 2
Reputation: 862406
I understand question how possible select by columns names.
Not easy, because first need positions with Index.get_loc
and pass it to numpy.r_
with select by DataFrame.iloc
:
a = df.columns.get_loc('a')
b = df.columns.get_loc('c')
c = len(df.columns)
df = df.iloc[:, np.r_[a, b:c]]
print (df)
a c d e f
0 0.210653 0.218035 0.845753 0.456271 0.279802
1 0.932892 0.909715 0.043418 0.707115 0.483889
2 0.444221 0.040683 0.332754 0.947120 0.617660
3 0.368875 0.206132 0.165066 0.361817 0.863353
4 0.509402 0.950252 0.815966 0.322974 0.972098
5 0.987351 0.655923 0.405653 0.257348 0.082653
Upvotes: 6