qwerty
qwerty

Reputation: 152

Why does `list(<pd.DataFrame>)` return a list of column names?

Let's say df is a typical pandas.DataFrame instance, I am trying to understand how come list(df) would return a list of column names.

The goal here is for me to track it down in the source code to understand how list(<pd.DataFrame>) returns a list of column names.

So far, the best resources I've found are the following:

Upvotes: 1

Views: 736

Answers (2)

darren
darren

Reputation: 5694

Actually, as you have correctly stated in your question. One can think of a pandas dataframe as a list of lists (or more correctly a dict like object).

Take a look at this code which takes a dict and parses it into a df.

import pandas as pd

# create a dataframe
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)

print(df)

x = list(df)
print(x)

x = list(d)
print(x)

The result in both cases (for the dataframe df and the dict d) is this:

['col1', 'col2']
['col1', 'col2']

This result confirms your thinking that a "DataFrame follows a dict-like convention" .

Upvotes: 1

timgeb
timgeb

Reputation: 78650

DataFrames are iterable. That's why you can pass them to the list constructor.

list(df) is equivalent to [c for c in df]. In both cases, DataFrame.__iter__ is called.

When you iterate over a DataFrame, you get the column names.

Why? Because the developers probably thought this is a nice thing to have.

Looking at the source, __iter__ returns an iterator over the attribute _info_axis, which seems to be the internal name of the columns.

Upvotes: 2

Related Questions