Iterating through a Pandas DataFrame is the same as Iterating through its Column-Names?

Question

I had thought that a Pandas DataFrame was basically represented as a collection of columns. That is, I thought the following two lines of code would produce the same lists of Series (for some arbitrary DataFrame df):

list1 = [item for item in df]
list2 = [df[col_name] for col_name in df.columns]

But apparently they're very different; treating the df like an iteratable and stepping through it is exactly the same as stepping through df.columns, which of course is just a list of column names:

df = pd.DataFrame({'col_1': [1,2,3,4,5], 'col_2':[5,6,7,8,9]})

for a, b in zip(df, df.columns):
    print(a,b, type(a), type(b), a==b)

outputs:

col_1 col_1   True
col_2 col_2   True

Why is this? This seems very unintuitive to me.

(To be clear: I'm not asking how to get a list of the columns in a DataFrame, or how to step through the columns of a DataFrame.)

Allen Qin · Accepted Answer

When you try to iterate a df directly like:

[item for item in df]

You are calling the df.__iter__() method which in turn calls the df._info_axis attribute and then the df._info_axis_name attribute which for Dataframe is the list of column names.

While when you call df[col_name], you are slicing the column of the dataframe.

Iterating through a Pandas DataFrame is the same as Iterating through its Column-Names?

Answers (1)

Related Questions