Garrett Mark Scott
Garrett Mark Scott

Reputation: 285

Selecting Various "Pieces" of a List

I have a list of columns in a Pandas DataFrame and looking to create a list of certain columns without manual entry.

My issue is that I am learning and not knowledgable enough yet.

I have tried searching around the internet but nothing was quite my case. I apologize if there is a duplicate.

The list I am trying to cut from looks like this:

['model', 'displ', 'cyl', 'trans', 'drive', 'fuel', 'veh_class', 'air_pollution_score', 'city_mpg', 'hwy_mpg', 'cmb_mpg', 'greenhouse_gas_score', 'smartway']

Here is the code that I wrote on my own: dataframe.columns.tolist()[:6,8:10,11]

In this case scenario I am trying to select everything but 'air_pollution_score' and 'greenhouse_gas_score'

My ultimate goal is to understand the syntax and how to select pieces of a list.

Upvotes: 4

Views: 69

Answers (3)

gmds
gmds

Reputation: 19885

You could do that, or you could just use drop to remove the columns you don't want:

dataframe.drop(['air_pollution_score', 'greenhouse_gas_score'], axis=1).columns

Note that you need to specify axis=1 so that pandas knows you want to remove columns, not rows.

Even if you wanted to use list syntax, I would say that it's better to use a list comprehension instead; something like this:

exclude_columns = ['air_pollution_score', 'greenhouse_gas_score']

[col for col in dataframe.columns if col not in exclude_columns]

This gets all the columns in the dataframe unless they are present in exclude_columns.

Upvotes: 5

dTanMan
dTanMan

Reputation: 137

Let's say df is your dataframe. You can actually use filters and lambda, though it quickly becomes too long. I present this as a "one-liner" alternative to the answer of @gmds.

df[
  list(filter(
    lambda x: ('air_pollution_score' not in x) and ('greenhouse_gas_x' not in x), 
    df.columns.values
  ))
]

What's happening here are:

  1. filter applies a function to a list to only include elements following a defined function/
  2. We defined that function using lambda to only check if 'air_pollution_score' or 'greenhouse_gas_x' are in the list.
  3. We're filtering on the df.columns.values list; so the resulting list will only retain the elements that weren't the ones we mentioned.
  4. We're using the df[['column1', 'column2']] syntax, which is "make a new dataframe but only containing the 2 columns I define."

Upvotes: 1

Mathanraj-Sharma
Mathanraj-Sharma

Reputation: 356

Simple solution with pandas

import pandas as pd

data = pd.read_csv('path to your csv file')
df = data['column1','column2','column3',....]

Note: data is your source you have already loaded using pandas, new selected columns will be stored in a new data frame df

Upvotes: 0

Related Questions