cwalde
cwalde

Reputation: 15

How to read Pandas Dataframe column information through string variable iteration

I have a Pandas DataFrame with columns 'Var_1_Access', 'Var_2_Access',... 'Var_N_Access' and there is other information/columns between these columns that I would like to look for. For example:

data = pd.read_csv('File')
df = pd.Dataframe(data)
print(df.columns)


Index = (['Var_1', 'Var_1_Access', 'Var_1_comp1', 'Var_1_comp2', 'Var_2', 'Var_2_Access', 'Var_2_comp1', 'Var_2_comp2'], dtype='object')

I would like to write a for loop that goes through the range of N and pulls out 'Var_1_Access' up to 'Var_N_Access'.

I've tried:

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.f"Var_%i_Access" % i)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_{i}_Access)

Access_Matrix = []
for i in range(1, N + 1):
    Access_Matrix.append(df.Var_[i]_Access)

These all result in errors. Yes it would be possible to just write them in as N is small, but N will grow large and I really don't want to have to type every variable name in individually, and would rather index it. The end goal is to read the Pandas dataframe information for N variables and have the Access_Matrix be of shape [len(Var_N_Access), N]. Also, there may be the need to add more information between these specific variable names later, so that is the reason I would like to index it by string variable names vs. column indices and look for a pattern.

I can provide more information if necessary, but I think that this covers the necessary information.

Upvotes: 1

Views: 715

Answers (2)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Use pandas.DataFrame.filter

It will filter the columns using regex and produce a filtered version

access_df = df.filter(regex=f'Var_\d_Access')

For a specific value of N or to get until a range,

access_df = df.filter(regex=f'Var_[1-{N}]_Access')

This method is much more efficient than using a crude loop.

Upvotes: 1

David Buck
David Buck

Reputation: 3828

You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.

for i in range(1, N + 1):
    Access_Matrix.append(df[f"Var_{i}_Access"])

Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df, e.g.:

cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]

Upvotes: 1

Related Questions