Reputation: 11

Please explain the following line of code for me. i.e pandas series creation using 2 columns of a dataframe

industry_usa = f500["industry"][f500["country"] == "USA"].value_counts().head(2)

This is a dataframe where some of its columns are industry and country. So why do we need to locate the 2 columns side by side while creating the indsutry_usa series. Please explain.

Upvotes: 0

Answers (1)

paradocslover

Reputation: 3294

I will break it down for you:

f500["industry"]: This selects the series (column) with the same name.

f500["country"] == "USA": This returns a boolean index containing True for all the rows which have their country column as USA.

f500["industry"][f500["country"] == "USA"]: As you might have guessed, this now is just like any other indexing we do in pandas. So, it selects all those "industry"s where the country is "USA".

.value_counts() : is just to do a count of the unique values. Like we have in Counter class in python

NOTE: The interesting fact is that you could change the order to - f500[f500["country"] == "USA"]["industry"] and still get the same result!!

Upvotes: 1

Please explain the following line of code for me. i.e pandas series creation using 2 columns of a dataframe

Answers (1)

Related Questions