MMM
MMM

Reputation: 435

python pandas dataframe groupby values and plots multiple graphs

Now I have a huge pandas data frame like below and whole data row is 2923922. I want to generate multiple line plots. GYEAR range is 1963 to 1999 and COUNTRY values is Non-US and US. and PATENT is CODE, CAT is Categorical Values. I want x-axis to be GYEAR and y-axis to be number of patents and plot lines by 'Us' / 'Non-Us'/ Total and another plot lines by 'Other' / 'Mechanical'/ 'Drugs & Medical'. How can I graph it?

    GYEAR   COUNTRY PATENT  CAT
0   1963    Non-US  3070801 Other
1   1963    US  3070802 Other
2   1963    US  3070803 Other
3   1966    US  3070804 Other
4   1966    US  3070805 Other
5   1967    US  3070806 Other
6   1970    US  3070807 Drugs & Medical
7   1970    US  3070808 Drugs & Medical
8   1963    US  3070809 Other
9   1965    US  3070810 Other
10  1965    US  3070811 Other
11  1964    US  3070812 Other
12  1964    US  3070813 Other
13  1964    US  3070814 Mechanical
14  1964    US  3070815 Mechanical
15  1998    US  3070816 Mechanical
16  1998    US  3070817 Mechanical
17  1998    US  3070818 Other
18  1999    US  3070819 Other 

sample 1

sample2

I tried these codes, but it did not work. Please give me some advice!!

us = df1[(df1['COUNTRY'] == 'US')]
nonus = df1[(df1['COUNTRY'] != 'US')]

plt.plot(us['GYEAR'], us['PATENT'], linewidth='4', color ='k',label='US')
plt.plot(nonus['GYEAR'], nonus['PATENT'], linewidth='1', color ='b',label='Non-US')

Upvotes: 3

Views: 1388

Answers (1)

jezrael
jezrael

Reputation: 862521

I think you need crosstab for reshape with plot:

pd.crosstab(df['GYEAR'], df['CAT']).plot()

df2 = pd.crosstab(df['GYEAR'], df['COUNTRY'])
df2['Total'] = df2.sum(axis=1)
df2.plot()

Alternative solution with aggregating size and reshape by unstack:

df.groupby(['GYEAR','CAT']).size().unstack(fill_value=0).plot()


df2 = df.groupby(['GYEAR','COUNTRY']).size().unstack(fill_value=0)
df2['Total'] = df2.sum(axis=1)
df2.plot()

Upvotes: 2

Related Questions