Reputation: 1
This is the first time I used stackoverflow so please forgive me if my question doesn't not follow proper conventions.
I'm trying to create a function to find the station with the maximum riders on the first day, then return the mean riders per day for that station. Also return the mean ridership overall. However, when I executed the following codes, a KeyError Exception was raised as below. Please advise what went wrong. Thank you very much!
import pandas as pd
def mean_riders_for_max_station(ridership_df):
overall_mean = ridership_df.mean()
max_station = ridership_df.iloc[0].argmax()
mean_for_max = ridership_df[max_station].mean()
return (overall_mean, mean_for_max)
ridership_df = pd.DataFrame(
data=[[ 0, 0, 2, 5, 0],
[1478, 3877, 3674, 2328, 2539],
[1613, 4088, 3991, 6461, 2691],
[1560, 3392, 3826, 4787, 2613],
[1608, 4802, 3932, 4477, 2705],
[1576, 3933, 3909, 4979, 2685],
[ 95, 229, 255, 496, 201],
[ 2, 0, 1, 27, 0],
[1438, 3785, 3589, 4174, 2215],
[1342, 4043, 4009, 4665, 3033]],
index=['05-01-11', '05-02-11', '05-03-11', '05-04-11', '05-05-11',
'05-06-11', '05-07-11', '05-08-11', '05-09-11', '05-10-11'],
columns=['R003', 'R004', 'R005', 'R006', 'R007']
)
print(mean_riders_for_max_station(ridership_df))
I received the following Error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 3
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-23-60b53dc0106e> in <module>
37 )
38
---> 39 mean_riders_for_max_station(ridership_df)
<ipython-input-23-60b53dc0106e> in mean_riders_for_max_station(ridership_df)
17
18 max_station = ridership_df.iloc[0].argmax() #difference between argmax() for an array (--returning a location)
---> 19 mean_for_max = ridership_df[max_station].mean() #and argmax() for a series: returning index (or column name of the dataframe)
20 return (overall_mean, mean_for_max)
21
~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 3
Upvotes: 0
Views: 2725
Reputation: 348
max_station will be 3, but ridership_df[max_station] will give a key error since there is no column name 3.
Upvotes: 0
Reputation: 81
The argmax()
method of a pandas Series returns the position of the maximum value (as in integer index in the array).
What you want is max_station = ridership_df.iloc[0].idxmax()
.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.argmax.html
Upvotes: 1