Martin Bachtold
Martin Bachtold

Reputation: 49

pandas select range from index column

I need to make a function to select a range of the index (first col).

1880    Aachen  1   Valid   L5  21.0    Fell    50.77500    6.08333 (50.775000, 6.083330)
1951    Aarhus  2   Valid   H6  720.0   Fell    56.18333    10.23333    (56.183330, 10.233330)
1952    Abee    6   Valid   EH4 107000.0    Fell    54.21667    -113.00000  (54.216670, -113.000000)
1976    Acapulco    10  Valid   Acapulcoite 1914.0  Fell    16.88333    -99.90000   (16.883330, -99.900000)
1902    Achiras 370 Valid   L6  780.0   Fell    -33.16667   -64.95000   (-33.166670, -64.950000)

How i can do this?

Upvotes: 2

Views: 13501

Answers (2)

eidal
eidal

Reputation: 175

Actually, at the time of writing this answer, you could use the DataFrame property loc in pandas. Here is an extract from the online documentation:

Access a group of rows and columns by label(s) or a boolean array.

Creating a DataFrame with your data, as done by Bill Armstrong, and using slicing produces the following result without the need of developing a new function:

print(df.loc[1951:1976])
    
                 0   1      2  ...         6         7                     8
1951    Aarhus   2  Valid  ...  56.18333  10.23333  (56.18333, 10.23333)
1952      Abee   6  Valid  ...  54.21667    -113.0    (54.21667, -113.0)
1976  Acapulco  10  Valid  ...  16.88333     -99.9     (16.88333, -99.9)
[3 rows x 9 columns]

Upvotes: 5

Bill Armstrong
Bill Armstrong

Reputation: 1777

To setup your data:

In [30]: df = pd.DataFrame({1880:[ 'Aachen',   1,   'Valid',   'L5',          
                                   21.0,     'Fell',    50.77500,    
                                   6.08333,    (50.775000, 6.083330)],
                            1951:[ 'Aarhus',   2,   'Valid',   'H6',          
                                   720.0,    'Fell',    56.18333,   
                                   10.23333,   (56.183330, 10.233330)],
                            1952:[ 'Abee',     6,   'Valid',   'EH4',         
                                   107000.0, 'Fell',    54.21667, 
                                   -113.00000, (54.216670, -113.000000)],
                            1976:[ 'Acapulco', 10,  'Valid',   'Acapulcoite', 
                                   1914.0,   'Fell',    16.88333,  
                                   -99.90000,  (16.883330, -99.900000)],
                            1902:[ 'Achiras',  370, 'Valid',   'L6',          
                                   780.0,    'Fell',   -33.16667,  
                                   -64.95000, (-33.166670, -64.950000)]}).T                 

In [31]: df
Out[31]: 
             0    1      2            3       4     5        6        7  \
1880    Aachen    1  Valid           L5      21  Fell   50.775  6.08333   
1902   Achiras  370  Valid           L6     780  Fell -33.1667   -64.95   
1951    Aarhus    2  Valid           H6     720  Fell  56.1833  10.2333   
1952      Abee    6  Valid          EH4  107000  Fell  54.2167     -113   
1976  Acapulco   10  Valid  Acapulcoite    1914  Fell  16.8833    -99.9   

                         8  
1880     (50.775, 6.08333)  
1902   (-33.16667, -64.95)  
1951  (56.18333, 10.23333)  
1952    (54.21667, -113.0)  
1976     (16.88333, -99.9)  

There are several ways to do this:

using index by number:

In [32]: def get_range(df, start, finish):
             return df[start:finish]

In [33]: print(get_range(df, 2, 4))
           0  1      2    3       4     5        6        7  \
1951  Aarhus  2  Valid   H6     720  Fell  56.1833  10.2333   
1952    Abee  6  Valid  EH4  107000  Fell  54.2167     -113   

                         8  
1951  (56.18333, 10.23333)  
1952    (54.21667, -113.0)

Or, if your data is ordered and you're looking for a group of rows between two known points (using slicing convention for start and finish) you can:

In [34]: def get_range(df, start, finish):
             on=False
             df_list=[]
             for i, row in df.iterrows():
                 if i == start:
                     on=True
                     df_list.append(i)
                 elif on:
                     if i == finish:
                         on=False
                     else:
                         df_list.append(i)
             return df.loc[df_list] 

In [35]: print(get_range(df, 1902, 1952))
            0    1      2   3    4     5        6        7  \
1902  Achiras  370  Valid  L6  780  Fell -33.1667   -64.95   
1951   Aarhus    2  Valid  H6  720  Fell  56.1833  10.2333   

                         8  
1902   (-33.16667, -64.95)  
1951  (56.18333, 10.23333)  

Plenty of room to improve on the code above...

Upvotes: 0

Related Questions