Harsha
Harsha

Reputation: 583

Maximum difference of values in the columns of a pandas dataframe

I have a dataframe with 8 columns as follows:

  1. Index
  2. Name of the counties
  3. Population from 2010 Census
  4. Population from 2011 Census
  5. Population from 2012 Census
  6. Population from 2013 Census
  7. Population from 2014 Census
  8. Population from 2015 Census

I need to find the county that has had the largest absolute change in population within the period 2010-2015?

e.g. If County Population in the 5 year period is 100, 120, 80, 105, 100, 130, then its largest change in the period would be |130-80| = 50. I am able to come up with a solution using for loops and conditionals but it doesn't seem to be the best way to solve the problem. How can I write a simple code using pandas dataframe functions?

Upvotes: 3

Views: 1217

Answers (2)

hardik_pnp
hardik_pnp

Reputation: 29

import pandas as pd

df = pd.DataFrame(
    {
        "country": ["India", "US", "China"],
        "2010": [200, 100, 300],
        "2012": [400, 200, 500],
        "2015": [800, 400, 700],
    }
)

df["abs_change"] = df.apply(
    lambda x: max(x[df.columns[-3:]]) - min(x[df.columns[-3:]]), axis=1
)

print(df.iloc[df[["abs_change"]].idxmax()])

This might be a possible solution to your problem without using for loops. Although I am not 100% sure about the performance implications, it might be better than using for loops.

Upvotes: 1

busybear
busybear

Reputation: 10590

Use min and max methods for dataframe while setting the parameter axis to 1. If you set your column 'Name of the counties' as your index, it makes it a little easier. Then you can use idxmax to find which county has the largest range.

df = df.set_index('Name of the counties')
(df.max(axis=1) - df.min(axis=1)).idxmax())

Upvotes: 2

Related Questions