Reputation: 300
pandas
is a huge library in python.
import pandas as pd
pd.__path__
['/usr/local/lib/python3.5/dist-packages/pandas']
I know the pandas library located in /usr/local/lib/python3.5/dist-packages/pandas
.
data = {'Name':['Tom', 'nick'], 'Age':[20, 21]}
df = pd.DataFrame(data)
df.cloumns
Index(['Age', 'Name'], dtype='object')
columns
is an attribution of dataframe, i want to know where is the dataframe's attribution columns
defination?
ls /usr/local/lib/python3.5/dist-packages/pandas
api conftest.py __init__.py plotting tests _version.py
arrays core io __pycache__ tseries
compat errors _libs testing.py util
In which directory and which file in the directory does the columns
attribution locate?
df.cloumns.__path__
can't give the answer.
Upvotes: 5
Views: 220
Reputation: 1903
columns
isn't defined in any single place. It is just an attribute on the DataFrame that points to an instance of another object. In particular, columns
must be an instance of pandas.core.indexes.base.Index
or one of its subclasses, which are also defined in submodules of pandas.core.indexes
but are also mostly accessible from the top-level module (e.g. pd.RangeIndex
).
I am distinguishing "defined" from two possibly-related ideas:
self.columns = ...
).Index
defined?The actual path to the base Index
class is at:
https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/core/indexes/base.py#L177
Likewise, on your local installation it will be at
[..]/python3.x/site-packages/pandas/core/indexes/base.py
.
columns
must be an instance of an Index
?Since python isn't strongly typed this is kind of hard to prove/enforce. However, DataFrame
inherits from NDFrame
, which is its N-dimensional generalization (Series
is the 1D version). At the end of the day, NDFrame
stores data in an attribute called... _data
, which is an instance of BlockManager
. Here you can see that the typings on axes
(columns
is a kind of axis) are as an Index
. All (orthodox) modifications to these axes will be run through a function ensure_index
, which will convert, e.g., lists to proper indices.
column
attribute set and retrieved?(Maybe this was the main question?)
The index object that columns
refers to lives in pd.DataFrame._data.axes[0]
. Custom implementations of __getattr__
and __setattr__
then ensure that the call to DataFrame.columns
returns that element.
But let me back up.
The call to the _setup_axes
class method alters the DataFrame
class (not instance) to have attributes columns
and index
.
In particular, _setup_axes
sets the columns
attribute to be an AxisProperty
with argument axis=0
. You could maybe think of _setup_axes
as a promise that each instance of the DataFrame
will have labels for two axes and, further, that these axes have names.
So why do calls to df.columns
return an Index rather than an AxisProperty
?
A call to df.columns
will:
__getattr__
. columns
among the entries in self._internal_names_set
so go to line 5270return object.__getattribute__(self, name)
. __get__
method of AxisProperty
. Notice that the second argument here (obj
) is our DataFrame instance(!).obj._data.axes
, i.e. the _data[.axes]
attribute of the dataframe.obj._data.axes
corresponding to self.axis
. The call to _setup_axes
had set self.axis=0
so we get the 0th element.Setting df.columns
(after initialization) works in a similar manner. When the DataFrame is initialized the columns are coerced into an Index
type, added to a list of axes, and passed as an argument to init a BlockManager
, which is then assigned to the _data
attribute.
Upvotes: 2
Reputation: 117
>>> import pandas as pd
>>> import inspect
>>> inspect.getfile(pd.DataFrame)
'/Users/.../lib/python3.7/site-packages/pandas/core/frame.py'
DataFrames would be initialized via __init__
:
https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/core/frame.py#L414
Specifically, when constucting a DataFrame from a dict, it uses the @classmethod
to instantiate the DF:
https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/core/frame.py#L1169
@classmethod
def from_dict(cls, data, orient="columns", dtype=None, columns=None) - "DataFrame":
...
return cls(data, index=index, columns=columns, dtype=dtype)
Checked that file in github and think this is where the columns
attribute is set:
https://github.com/pandas-dev/pandas/blob/v1.0.3/pandas/core/frame.py#L8449
DataFrame._setup_axes(
["index", "columns"],
docs={
"index": "The index (row labels) of the DataFrame.",
"columns": "The column labels of the DataFrame.",
},
)
EDIT: Added reference to def __init__
, def from_dict
and changed paths to stable pandas version
Upvotes: 5