Kendo96
Kendo96

Reputation: 15

Manipulate data in a DataFrame

I am very new to Python, so please bear with me. I have this dataframe called df_staging (sample image) that summarizes the annual sales and income for multiple counties (A1,A2,A3,A4,etc.).

I want to do the following:

  1. Estimate sum and mean of each county.
  2. For each county, plot the sales and income for each year. Each county in 1 subplot.

Ideally, I could use a for loop to perform 1 and 2. I have tried (doesn't work):

ind_county = df_staging['County Name'].drop_duplicates()
ind_county = ind_county[:2]

for i in ind_county:
    ts_year = df_staging.loc[df_staging['County Name'] == ind_county[i]]
    ts_year.plot.bar(x = 'Year', y = 'Sales')

How would I accomplish this? Is it possible to do it without using a for loop? I am open to any suggestions and tips that anyone has.

Thank you.

Sample image

Upvotes: 0

Views: 66

Answers (1)

kpatucha
kpatucha

Reputation: 1

There is neat method df["col"].unique() which accomplishes what you did with the drop_duplicates.

When you loop through ind_county the resulting i is already county code (e.g. A1). That's why ind_county[i] fails. Simply replace it with i. Then you will have DataFrame for a given county so you can plot it, calculate mean (using df["col"].mean() mehtod) etc.

Upvotes: 0

Related Questions