How can I define variables for several columns

Question

I'm creating a program that returns different statistics form any file uploaded (with a certain data structure).

I need to write some code that allows to define variables for the columns in each file, the problem is that in some cases there are 5 columns and in others 7, 8 or more.

Any thoughts? Maybe with a for loop?

I expect the program to read all the columns and name them x1, x2, x3 and so on.

MichaelD · Accepted Answer

If you don't specify the names of the headers then pandas will infer them. You can change them after you read them if you like or you can force them to be what you want.

For instance, letting pandas infer the header names and then renaming them X1...

df = pd.read_csv('test.csv',header=None)
df 

    0   1   2   3   4   #<- Header names given by pandas
0   1   2   3   4   5

df.columns = [f"X{_}" for _ in df.index]
    X0  X1  X2  X3  X4
0   1   2   3   4   5

Or if you want to give each column a specific name, something like

df.columns = ['Foo', 'Bar', 'Baz', 'Biz', 'Boo']
    Foo Bar Baz Biz Boo
0   1   2   3   4   5

Or if you prefer to ensure that all data has 8 columns regardless of what the user passed in. In this case you will get NaN in the unfilled columns

df = pd.read_csv('test.csv',header=None,names=['X1','X2','X3','X4','X5','X6','X7','X8'])
    X1  X2  X3  X4  X5  X6  X7  X8
0   1   2   3   4   5   NaN NaN NaN

No matter how you code it, you have columns with the names you provide or the ones pandas provides.

df['Foo'] == df[1] == df['X1']

How can I define variables for several columns

Answers (1)

Related Questions