Reputation: 15
I have a Pandas DataFrame with columns 'Var_1_Access'
, 'Var_2_Access'
,... 'Var_N_Access'
and there is other information/columns between these columns that I would like to look for. For example:
data = pd.read_csv('File')
df = pd.Dataframe(data)
print(df.columns)
Index = (['Var_1', 'Var_1_Access', 'Var_1_comp1', 'Var_1_comp2', 'Var_2', 'Var_2_Access', 'Var_2_comp1', 'Var_2_comp2'], dtype='object')
I would like to write a for loop that goes through the range of N and pulls out 'Var_1_Access'
up to 'Var_N_Access'
.
I've tried:
Access_Matrix = []
for i in range(1, N + 1):
Access_Matrix.append(df.f"Var_%i_Access" % i)
Access_Matrix = []
for i in range(1, N + 1):
Access_Matrix.append(df.Var_{i}_Access)
Access_Matrix = []
for i in range(1, N + 1):
Access_Matrix.append(df.Var_[i]_Access)
These all result in errors. Yes it would be possible to just write them in as N is small, but N will grow large and I really don't want to have to type every variable name in individually, and would rather index it. The end goal is to read the Pandas dataframe information for N variables and have the Access_Matrix
be of shape [len(Var_N_Access), N]
. Also, there may be the need to add more information between these specific variable names later, so that is the reason I would like to index it by string variable names vs. column indices and look for a pattern.
I can provide more information if necessary, but I think that this covers the necessary information.
Upvotes: 1
Views: 715
Reputation: 10960
It will filter the columns using regex and produce a filtered version
access_df = df.filter(regex=f'Var_\d_Access')
For a specific value of N
or to get until a range,
access_df = df.filter(regex=f'Var_[1-{N}]_Access')
This method is much more efficient than using a crude loop.
Upvotes: 1
Reputation: 3828
You won't be able to do it with '.' notation, but you should be able to do this in square brackets with a 'f' string.
for i in range(1, N + 1):
Access_Matrix.append(df[f"Var_{i}_Access"])
Or, perhaps a better approach would be to build up a list of the column names and extract them into a new dataframe in one go from df
, e.g.:
cols = [f"Var_{i}_Access" for i in range(1, N+1)]
all_cols = df[cols]
Upvotes: 1