Yehudis Bensinger
Yehudis Bensinger

Reputation: 21

When I name columns in dataframe it deletes my data

I created a pandas DataFrame that holds various summary statistics for several variables in my dataset. I want to name the columns of the dataframe, but every time I try it deletes all my data. Here is what it looks like without column names:

MIN = df.min(axis=0, numeric_only=True)
MAX = df.max(axis=0, numeric_only=True)
RANGE = MAX-MIN
MEAN = df.mean(axis=0, numeric_only=True)
MED = df.median(axis=0, numeric_only=True)

sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
sum_stats = pd.DataFrame(data=sum_stats)
sum_stats

My output looks like this: enter image description here

But for some reason when I add column names:

sum_stats = pd.concat([MIN, MAX, RANGE, MEAN, MED], axis=1)
columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']
sum_stats = pd.DataFrame(data=sum_stats, columns=columns)
sum_stats

My output becomes this: enter image description here

Any idea why this is happening?

Upvotes: 1

Views: 192

Answers (1)

user17242583
user17242583

Reputation:

From the documentation for the columns parameter of the pd.DataFrame constructor:

[...] If data contains column labels, will perform column selection instead.

That means that, if the data passed is already a dataframe, for example, the columns parameter will act as a list of columns to select from the data.

If you change columns to equal a list of some columns that already exist in the dataframe that you're using, e.g. columns=[1, 4], you'll see that the resulting dataframe only contains those two columns, copied from the original dataframe.


Instead, you can assign the columns after you create the dataframe:

sum_stats.columns = ['MIN', 'MAX', 'RANGE', 'MEAN', 'MED']

Upvotes: 1

Related Questions