vestland
vestland

Reputation: 61134

Is it possible to read source code from the pandas library and use that in my own functions?

Background:

I'd like to slice a pandas dataframe in elements of a given row length, and perform calculations on them.

pandas.DataFrame.rolling will let me do that, but seemingly only with built-in functions like sum() in the example df.rolling(2, win_type='triang').sum(). I would also like to plot these subsets (I'm able to do that by slicing and some For Loops, but it's a bit slow).

What I've found out:

From How can I get the source code of a Python function? I've learnt that I can read source code using pandas.DataFrame.rolling?? which will give me this:

enter image description here

But trying to dig deeper from here using for example rolling?? seems futile:

enter image description here

So, is it possible to reference the underlying functions of pandas.DataFrame.rolling somehow, or is this where it ends using Python? I guess so since the docs state that pandas is written in Cython or C, but I'm really curious about this so I'd like to ask about this here as well.

Thank you for any suggestions!

Upvotes: 3

Views: 5049

Answers (3)

tevemadar
tevemadar

Reputation: 13225

Good/bad news: your suffering totally does not end there.

[side-note]
It is easy to not-find where source code is located in your system, especially if you use extra layers like Anaconda.
When in doubt, you can check the __file__ attribute in an interactive shell:

import pandas
pandas.__file__
>>> 'C:\\Users\\xy\\AppData\\Local\\Continuum\\Anaconda3\\lib\\site-packages\\pandas\\__init__.py'

[/side-note]

If you look up that actual piece of code, it comes from NDFrame in pandas/core/generic.py, and there is an import line just before it:

from pandas.core import window as rwindow

@Appender(rwindow.rolling.__doc__)
def rolling(self, window, min_periods=None, freq=None, center=False,
            win_type=None, on=None, axis=0, closed=None):
    axis = self._get_axis_number(axis)
    return rwindow.rolling(self, window=window,
                           min_periods=min_periods, freq=freq,
                           center=center, win_type=win_type,
                           on=on, axis=axis, closed=closed)

So your adventure continues in pandas/core/window.py where rolling is somewhere at the very end:

def rolling(obj, win_type=None, **kwds):
    from pandas import Series, DataFrame
    if not isinstance(obj, (Series, DataFrame)):
        raise TypeError('invalid type: %s' % type(obj))

    if win_type is not None:
        return Window(obj, win_type=win_type, **kwds)

    return Rolling(obj, **kwds)

And all of Window, Rolling, and their parent classes (_Window, _Rolling_and_Expanding, _Rolling - and this one also comes from _Window) stretch over thousands of lines in the same file.

Upvotes: 1

Nico Albers
Nico Albers

Reputation: 1696

This is not an answer on how to read the source code, but on how to get your stated problem solved:

Use apply on rolling. For example, try df.rolling(2, win_type='triang').apply(yourfunc, args=(), kwargs={})

from the docs, yourfunc

Must produce a single value from an ndarray input *args and **kwargs are passed to the function

This is the better approach, since you shouldn't take pandas source and use it copy-pasted and edited in your code if not really needed (there are some bugfixes, it may be outdated in some time, etc..). Here we have the possibility to use an own function already implemented.

Upvotes: 5

LangeHaare
LangeHaare

Reputation: 3026

The Pandas source code is open source and currently available on GitHub at: https://github.com/pandas-dev/pandas

You could also look here at the contributors' guide for an idea of how the code is laid out: https://pandas.pydata.org/pandas-docs/stable/contributing.html

And in the docs there are links to the code that the docs refer to (like so)

Screenshot of pandas groupby.apply docs with link to source

Upvotes: 3

Related Questions