Can I select columns in a dataframe using a for loop?

Question

I have a dataframe with more than 400 columns, I'm trying to select a sub-df with about half the columns based on some conditions. I have already stored the filtered columns as a list to hopefully use a for loop to iterate through them and select for the new df but I keep only getting the last column in the list.

My list has the 200 filtered columns. I used the following for loop:

for i in list:
    df1 = df[["col1", "col2"]]
    df2 = df[[i]]
    df1 = df1.join(df2)

My final result should consist of "col1", "col2" and the subsequent 200 columns but the output I keep getting is 3 columns, "col1", "col2", and the 200th column.

mozway · Accepted Answer

You should never join columns repeatedly. This is inefficient and will fragment the DataFrame.

Assuming your list is named lst, you should just do:

out = df[['col1', 'col2']+lst]

Your code failed since you're overwriting df1 at each step. This would have worked, but this is really not a good approach:

df1 = df[["col1", "col2"]]
for i in lst:
    df2 = df[[i]]
    df1 = df1.join(df2)

Can I select columns in a dataframe using a for loop?

Answers (2)

Related Questions