Removing dataset columns with python slices

Question

I have the following dataframe:

dataframe = pd.DataFrame({'Date': ['2017-04-01 00:24:17','2017-04-01 00:54:16','2017-04-01 01:24:17'] * 1000, 'Luminosity':[2,3,4] * 1000})

The output of dataframe is this:

      Date                Luminosity
0   2017-04-01 00:24:17     2
1   2017-04-01 00:54:16     3
2   2017-04-01 01:24:17     4
.           .               . 
.           .               .

I want remove or select just the Luminosity column, then, with python slices I have the following:

X = dataframe.iloc[:, 1].values
# Give a new form of the data
X = X.reshape(-1, 1)

And the output of X is the following numpy array:

array([[2],
   [3],
   [4],
   ...,
   [2],
   [3],
   [4]])

I have the same situation, but a new dataframe with 76 columns, like this

This is the output when I read it.

In total, the dataframe have 76 columns, I just want select 25 columns which are the columns named PORVL2N1 , PORVL2N2, PORVL4N1 and so successively until arrive to the end column named PORVL24N2 which is the 76th column

For the moment, the solution that I have is create a new data frame only with the columns of my interest, this is:

a = df[['PORVL2N1', 'PORVL2N2', 'PORVL4N1', 'PORVL5N1', 'PORVL6N1', 'PORVL7N1', 
    'PORVL9N1', 'PORVL9N1', 'PORVL10N1', 'PORVL13N1', 'PORVL14N1', 'PORVL15N1',
    'PORVL16N1', 'PORVL16N2', 'PORVL18N1', 'PORVL18N2', 'PORVL18N3','PORVL18N4',
    'PORVL21N1', 'PORVL21N2', 'PORVL21N3', 'PORVL21N4', 'PORVL21N5', 'PORVL24N1',
    'PORVL24N2']

And the output is:

I want make the same, select just the columns of my interest, but using python slices with iloc to indexing and selecting by position, such as I make in the beginning of my question.

I know that this is possible with slides, but I cannot understand good the slices sintax to get it.

How to can I using iloc and slices python to select my interest columns?

harpan · Accepted Answer

Considering you have your data in dataframe df, you can do the following:

cols = list(df.columns)
pos_cols = [ i for i, word in enumerate(cols) if word.startswith('PORVL') ]
df.iloc[:, pos_cols]

Alternatively, you can use .filter() with regex.

df.filter(regex=("PORVL.*"))

Have a look at docs for more information.

Removing dataset columns with python slices

Answers (2)

Related Questions