ragesz
ragesz

Reputation: 9527

Python pandas slice dataframe by multiple index ranges

What is the pythonic way to slice a dataframe by more index ranges (eg. by 10:12 and 25:28)?

I want this in a more elegant way:

df = pd.DataFrame({'a':range(10,100)})
df.iloc[[i for i in range(10,12)] + [i for i in range(25,28)]]

Result:

     a
10  20
11  21
25  35
26  36
27  37

Something like this would be more elegant:

df.iloc[(10:12, 25:28)]

Upvotes: 50

Views: 35902

Answers (3)

Liquidgenius
Liquidgenius

Reputation: 708

Building off of @KevinOelen's use of Panda's isin function, here is a pythonic way (Python 3.8) to glance at a Pandas DataFrame or GeoPandas GeoDataFrame revealing just a few rows of the head and tail. This method does not require importing numpy.

To use just call glance(your_df). Additional explanation in docstring.

import pandas as pd
import geopandas as gpd  # if not needed, remove gpd.GeoDataFrame from the type hinting and no need to import Union
from typing import Union


def glance(df: Union[pd.DataFrame, gpd.GeoDataFrame], size: int = 2) -> None:
    """ Provides a shortened head and tail summary of a Dataframe or GeoDataFrame in Jupyter Notebook or Lab.

    Usage
    ----------
    # default glance (2 head rows, 2 tail rows)
    glance( df )
    
    # glance defined number of rows in head and tail (3 head rows, 3 tails rows)
    glance( df, size=3 )

    Parameters
    ----------
    :param df: Union[pd.DataFrame, gpd.GeoDataFrame]: A (Geo)Pandas data frame to glance at.
    :param size: int: The number of rows in the head and tail to display, total rows will be double provided size.
    :return: None: Displays result in Notebook or Lab.
    """
    
    # min and max of the provided dataframe index
    min_ = df.index.min()
    max_ = df.index.max()

    # define slice
    sample = [i for i in range(min_, size)] + [i for i in range(max_ - size, max_)]

    # slice
    df = df[df.index.isin(sample)]
    
    # display
    display( df )

Upvotes: 0

Jon Clements
Jon Clements

Reputation: 142156

You can use numpy's r_ "slicing trick":

df = pd.DataFrame({'a':range(10,100)})
df.iloc[pd.np.r_[10:12, 25:28]]

NOTE: this now gives a warning The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead. To do that, you can import numpy as np and then slice the following way:

df.iloc[np.r_[10:12, 25:28]]

This gives:

     a
10  20
11  21
25  35
26  36
27  37

Upvotes: 98

KevinOelen
KevinOelen

Reputation: 779

You can take advantage of pandas isin function.

df = pd.DataFrame({'a':range(10,100)})
ls = [i for i in range(10,12)] + [i for i in range(25,28)]
df[df.index.isin(ls)]


    a
10  20
11  21
25  35
26  36
27  37

Upvotes: 9

Related Questions