NikTheBoss
NikTheBoss

Reputation: 35

Get value from Pandas Series

I was working with some Temperature Data with Pandas.

From a DataFrame called 'data' i got the first data observation thanks this line of code:

first_obs = data['DATE'][0]

Keep in mind that data['DATE'] is a pandas.Series object. data indexes: STATION ELEVATION LATITUDE LONGITUDE DATE PRCP TAVG TMAX TMIN YEAR MONTH

After some data manipulation i created a new DataFrame 'monthly_data' with these indexes: MONTH TAVG YEAR temp_celsius ref_temp diff abs_diff

Now i wanted to get the row of this dataframe with the maximum value in the 'abs_diff' column:

weather_anomaly = monthly_data.loc[monthly_data['abs_diff'] == monthly_data['abs_diff'].max()]

Now weather_anomaly is another DataFrame Object so now the strange problem comes up: If i write the code as before like this:

weather_anomaly['MONTH'][0]

an error comes up:

KeyError Traceback (most recent call last) ~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3079 try: -> 3080 return self._engine.get_loc(casted_key) 3081 except KeyError as err:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last) in ----> 1 weather_anomaly['MONTH'][0] 2 print('The month with the greatest temperature anomaly is ', weather_anomaly['MONTH'].values[0], 'of the year ', weather_anomaly['YEAR'].values[0], ' with a difference of ', weather_anomaly['diff'].values[0])

~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in getitem(self, key) 851 852 elif key_is_scalar: --> 853 return self._get_value(key) 854 855 if is_hashable(key):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable) 959 960 # Similar to Index.get_value, but we do not fall back to positional --> 961 loc = self.index.get_loc(label) 962 return self.index._get_values_for_loc(self, loc, label) 963

~\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 3080
return self._engine.get_loc(casted_key) 3081 except KeyError as err: -> 3082 raise KeyError(key) from err 3083 3084 if tolerance is not None:

KeyError: 0

Nothing explaining comes up. Fortunately the solution of this proplem is easy:

weather_anomaly['MONTH'].values[0]

So the final Question is despite data['DATE'] and monthly_data['abs_diff'] are both pandas.Series objects why weather_anomaly['abs_diff'][0] does not work?

Upvotes: 2

Views: 5034

Answers (2)

tlouarn
tlouarn

Reputation: 171

I assume your original DataFrame has an index column with incrementing integers, so in your first example it so happens that data['DATE'][0] and data['DATE'].iloc[0] return the same result.

But after you select a specific row with the max() condition, the new DataFrame weather_anomaly contains only one row which keeps its original index which may not be zero.

Therefore, in order to select the first row of weather_anomaly, you need to either use .iloc[0] or reset_index() and use [0].

I advise you print your DataFrames and you will clearly see how the index column behaves.

Upvotes: 2

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

TL;DR The reason is that the index of weather_anomaly['MONTH'] is not normal integer.

After some data manipulation i created a new DataFrame monthly_data with these indexes: MONTH TAVG YEAR temp_celsius ref_temp diff abs_diff

weather_anomaly = monthly_data.loc[monthly_data['abs_diff'] == monthly_data['abs_diff'].max()]

As you stated above, The index of monthly_data is not integer. After doing operation on monthly_data to get weather_anomaly. The index of weather_anomaly is like monthly_data.

If you want to locate Series by integer, you can use pandas.Series.iloc(). In your example,

weather_anomaly['MONTH'].iloc[0]

Upvotes: 0

Related Questions