Muhammad Ashfaq
Muhammad Ashfaq

Reputation: 94

Selecting all values greater than a number in a panda data frame

I have a dataframe like this with more than 50 columns(for years from 1963 to 2016). I was looking to select all countries with a population over a certain number(say 60 million). Now, when I looked, all the questions were about picking values from a single column. Which is not the case here. I also tried df[df.T[(df.T > 0.33)].any()] as was suggested in an answer. Doesn't work. Any ideas?

The data frame looks like this:

Country Country_Code   Year_1979  Year_1999   Year_2013
  Aruba          ABW     59980.0      89005    103187.0
 Angola          AGO   8641521.0   15949766  25998340.0
Albania          ALB   2617832.0    3108778   2895092.0
Andorra          AND     34818.0      64370     80788.0

Upvotes: 2

Views: 2568

Answers (1)

jezrael
jezrael

Reputation: 863226

First filter only columns with Year in columns names by DataFrame.filter, compare all rows and then test by DataFrame.any at least one matched value per row:

df1 = df[(df.filter(like='Year') > 2000000).any(axis=1)]
print (df1)
   Country Country_Code  Year_1979  Year_1999   Year_2013
1   Angola          AGO  8641521.0   15949766  25998340.0
2  Albania          ALB  2617832.0    3108778   2895092.0

Or compare all columns without first 2 selected by positons with DataFrame.iloc:

df1 = df[(df.iloc[:, 2:] > 2000000).any(axis=1)]
print (df1)
   Country Country_Code  Year_1979  Year_1999   Year_2013
1   Angola          AGO  8641521.0   15949766  25998340.0
2  Albania          ALB  2617832.0    3108778   2895092.0

Upvotes: 3

Related Questions