Reputation: 49
I need to make a function to select a range of the index (first col).
1880 Aachen 1 Valid L5 21.0 Fell 50.77500 6.08333 (50.775000, 6.083330)
1951 Aarhus 2 Valid H6 720.0 Fell 56.18333 10.23333 (56.183330, 10.233330)
1952 Abee 6 Valid EH4 107000.0 Fell 54.21667 -113.00000 (54.216670, -113.000000)
1976 Acapulco 10 Valid Acapulcoite 1914.0 Fell 16.88333 -99.90000 (16.883330, -99.900000)
1902 Achiras 370 Valid L6 780.0 Fell -33.16667 -64.95000 (-33.166670, -64.950000)
How i can do this?
Upvotes: 2
Views: 13501
Reputation: 175
Actually, at the time of writing this answer, you could use the DataFrame property loc in pandas. Here is an extract from the online documentation:
Access a group of rows and columns by label(s) or a boolean array.
Creating a DataFrame with your data, as done by Bill Armstrong, and using slicing produces the following result without the need of developing a new function:
print(df.loc[1951:1976])
0 1 2 ... 6 7 8
1951 Aarhus 2 Valid ... 56.18333 10.23333 (56.18333, 10.23333)
1952 Abee 6 Valid ... 54.21667 -113.0 (54.21667, -113.0)
1976 Acapulco 10 Valid ... 16.88333 -99.9 (16.88333, -99.9)
[3 rows x 9 columns]
Upvotes: 5
Reputation: 1777
To setup your data:
In [30]: df = pd.DataFrame({1880:[ 'Aachen', 1, 'Valid', 'L5',
21.0, 'Fell', 50.77500,
6.08333, (50.775000, 6.083330)],
1951:[ 'Aarhus', 2, 'Valid', 'H6',
720.0, 'Fell', 56.18333,
10.23333, (56.183330, 10.233330)],
1952:[ 'Abee', 6, 'Valid', 'EH4',
107000.0, 'Fell', 54.21667,
-113.00000, (54.216670, -113.000000)],
1976:[ 'Acapulco', 10, 'Valid', 'Acapulcoite',
1914.0, 'Fell', 16.88333,
-99.90000, (16.883330, -99.900000)],
1902:[ 'Achiras', 370, 'Valid', 'L6',
780.0, 'Fell', -33.16667,
-64.95000, (-33.166670, -64.950000)]}).T
In [31]: df
Out[31]:
0 1 2 3 4 5 6 7 \
1880 Aachen 1 Valid L5 21 Fell 50.775 6.08333
1902 Achiras 370 Valid L6 780 Fell -33.1667 -64.95
1951 Aarhus 2 Valid H6 720 Fell 56.1833 10.2333
1952 Abee 6 Valid EH4 107000 Fell 54.2167 -113
1976 Acapulco 10 Valid Acapulcoite 1914 Fell 16.8833 -99.9
8
1880 (50.775, 6.08333)
1902 (-33.16667, -64.95)
1951 (56.18333, 10.23333)
1952 (54.21667, -113.0)
1976 (16.88333, -99.9)
There are several ways to do this:
using index by number:
In [32]: def get_range(df, start, finish):
return df[start:finish]
In [33]: print(get_range(df, 2, 4))
0 1 2 3 4 5 6 7 \
1951 Aarhus 2 Valid H6 720 Fell 56.1833 10.2333
1952 Abee 6 Valid EH4 107000 Fell 54.2167 -113
8
1951 (56.18333, 10.23333)
1952 (54.21667, -113.0)
Or, if your data is ordered and you're looking for a group of rows between two known points (using slicing convention for start and finish) you can:
In [34]: def get_range(df, start, finish):
on=False
df_list=[]
for i, row in df.iterrows():
if i == start:
on=True
df_list.append(i)
elif on:
if i == finish:
on=False
else:
df_list.append(i)
return df.loc[df_list]
In [35]: print(get_range(df, 1902, 1952))
0 1 2 3 4 5 6 7 \
1902 Achiras 370 Valid L6 780 Fell -33.1667 -64.95
1951 Aarhus 2 Valid H6 720 Fell 56.1833 10.2333
8
1902 (-33.16667, -64.95)
1951 (56.18333, 10.23333)
Plenty of room to improve on the code above...
Upvotes: 0