Tokaalmighty
Tokaalmighty

Reputation: 402

Subtract the minimum value from the maximum value across each row, Python Pandas DataFrame

I have a census dataset indexed by state name and county name and want to loop through each row to find the max and min value across all columns labelled as 'population estimate in each year', then subtract these two values. I want the function to return a Pandas Series with index and value.

Here is my current code:

columns_to_keep=[
    'STNAME',
    'CTYNAME',
    'POPESTIMATE2010',
    'POPESTIMATE2011',
    'POPESTIMATE2012',
    'POPESTIMATE2013',
    'POPESTIMATE2014',
    'POPESTIMATE2015' 
]
df=census_df[columns_to_keep]

def answer_seven(lst):
    lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
             df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]

    return max(lst)-min(lst)

answer_seven(lst)

error message:

ValueError                                Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
     18     return max(lst)-min(lst)
     19 
---> 20 answer_seven(lst)
     21 

<ipython-input-110-845350b0b5f7> in answer_seven(lst)
     16              df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
     17 
---> 18     return max(lst)-min(lst)
     19 
     20 answer_seven(lst)

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
    890         raise ValueError("The truth value of a {0} is ambiguous. "
    891                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892                          .format(self.__class__.__name__))
    893 
    894     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 2

Views: 6776

Answers (3)

CJW
CJW

Reputation: 81

I had trouble with NaN values that I needed to keep and used the following:

x = {}
for col in df_count:
    x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)

Upvotes: 0

Brad Solomon
Brad Solomon

Reputation: 40878

Or consider numpy.ptp for speed:

Range of values (maximum - minimum) along an axis.

np.ptp(df[cols_of_interest].values, axis=1)

Upvotes: 2

johnchase
johnchase

Reputation: 13705

Pandas can do this directly:

cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)

The return of this will be a series indexed by the original index of your dataframe and the maximum value for each row minus the minimum value

Upvotes: 3

Related Questions