Reputation: 583
I have a dataframe with 8 columns as follows:
I need to find the county that has had the largest absolute change in population within the period 2010-2015?
e.g. If County Population in the 5 year period is 100, 120, 80, 105, 100, 130, then its largest change in the period would be |130-80| = 50. I am able to come up with a solution using for loops and conditionals but it doesn't seem to be the best way to solve the problem. How can I write a simple code using pandas dataframe functions?
Upvotes: 3
Views: 1217
Reputation: 29
import pandas as pd
df = pd.DataFrame(
{
"country": ["India", "US", "China"],
"2010": [200, 100, 300],
"2012": [400, 200, 500],
"2015": [800, 400, 700],
}
)
df["abs_change"] = df.apply(
lambda x: max(x[df.columns[-3:]]) - min(x[df.columns[-3:]]), axis=1
)
print(df.iloc[df[["abs_change"]].idxmax()])
This might be a possible solution to your problem without using for loops. Although I am not 100% sure about the performance implications, it might be better than using for loops.
Upvotes: 1
Reputation: 10590
Use min
and max
methods for dataframe while setting the parameter axis
to 1
. If you set your column 'Name of the counties'
as your index, it makes it a little easier. Then you can use idxmax
to find which county has the largest range.
df = df.set_index('Name of the counties')
(df.max(axis=1) - df.min(axis=1)).idxmax())
Upvotes: 2