Reputation: 402
I have a census dataset indexed by state name and county name and want to loop through each row to find the max and min value across all columns labelled as 'population estimate in each year', then subtract these two values. I want the function to return a Pandas Series with index and value.
Here is my current code:
columns_to_keep=[
'STNAME',
'CTYNAME',
'POPESTIMATE2010',
'POPESTIMATE2011',
'POPESTIMATE2012',
'POPESTIMATE2013',
'POPESTIMATE2014',
'POPESTIMATE2015'
]
df=census_df[columns_to_keep]
def answer_seven(lst):
lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
return max(lst)-min(lst)
answer_seven(lst)
error message:
ValueError Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
18 return max(lst)-min(lst)
19
---> 20 answer_seven(lst)
21
<ipython-input-110-845350b0b5f7> in answer_seven(lst)
16 df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
17
---> 18 return max(lst)-min(lst)
19
20 answer_seven(lst)
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 2
Views: 6776
Reputation: 81
I had trouble with NaN values that I needed to keep and used the following:
x = {}
for col in df_count:
x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)
Upvotes: 0
Reputation: 40878
Or consider numpy.ptp
for speed:
Range of values (maximum - minimum) along an axis.
np.ptp(df[cols_of_interest].values, axis=1)
Upvotes: 2
Reputation: 13705
Pandas can do this directly:
cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)
The return of this will be a series indexed by the original index of your dataframe and the maximum value for each row minus the minimum value
Upvotes: 3