PANDAS dataframe python: wanting to sort values by group

Question

I have the following link above for a CSV file containing the raw data for which I wish to manipulate.

census_df = df = pd.read_csv('https://raw.githubusercontent.com/Qian-Han/coursera-Applied-Data-Science-with-Python/master/Introduction-to-Data-Science-in-Python/original_data/census.csv')
sortedit = census_df.sort_values(by = ['STNAME','CENSUS2010POP'],ascending=False)

I am trying to order the data in descending order by the column 'CENSUS2010POP'.

I also want to order the data by 'state' alphabetically, hence why I have including the 'STNAME' column in the formula above.

However, I only want to select the 3 highest values for 'CENSUS2010POP' from each state ('STNAME').

Thus, if there are 146 states in total, I should (146 x 3) rows in my new dataframe (and thus in the 'CENSUS2010POP' column).

I would be so grateful if anybody could give me a helping hand?

Hamid · Accepted Answer

try this:

df = census_df.groupby(["STNAME"]).apply(lambda x: x.sort_values(["CENSUS2010POP"], ascending = False)).reset_index(drop=True)

df.groupby('STNAME').head(3)[['STNAME','CENSUS2010POP']]

The first statement returns dataframe sorted by CENSUS2010POP in each STNAME.

The second statement returns the top 3.

PANDAS dataframe python: wanting to sort values by group

Answers (2)

Related Questions