Reputation: 21
I created a pandas DataFrame that holds various summary statistics for several variables in my dataset. I want to name the columns of the dataframe, but every time I try it deletes all my data. Here is what it looks like without column names:
MIN = df.min(axis=0, numeric_only=True)
MAX = df.max(axis=0, numeric_only=True)
RANGE = MAX-MIN
MEAN = df.mean(axis=0, numeric_only=True)
MED = df.median(axis=0, numeric_only=True)
sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
sum_stats = pd.DataFrame(data=sum_stats)
sum_stats
But for some reason when I add column names:
sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']
sum_stats = pd.DataFrame(data=sum_stats, columns=columns)
sum_stats
Any idea why this is happening?
Upvotes: 1
Views: 192
Reputation:
From the documentation for the columns
parameter of the pd.DataFrame
constructor:
[...] If data contains column labels, will perform column selection instead.
That means that, if the data passed is already a dataframe, for example, the columns
parameter will act as a list of columns to select from the data.
If you change columns
to equal a list of some columns that already exist in the dataframe that you're using, e.g. columns=[1, 4]
, you'll see that the resulting dataframe only contains those two columns, copied from the original dataframe.
Instead, you can assign the columns after you create the dataframe:
sum_stats.columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']
Upvotes: 1