Reputation: 435
Now I have a huge pandas data frame like below and whole data row is 2923922. I want to generate multiple line plots. GYEAR range is 1963 to 1999 and COUNTRY values is Non-US and US. and PATENT is CODE, CAT is Categorical Values. I want x-axis to be GYEAR and y-axis to be number of patents and plot lines by 'Us' / 'Non-Us'/ Total and another plot lines by 'Other' / 'Mechanical'/ 'Drugs & Medical'. How can I graph it?
GYEAR COUNTRY PATENT CAT
0 1963 Non-US 3070801 Other
1 1963 US 3070802 Other
2 1963 US 3070803 Other
3 1966 US 3070804 Other
4 1966 US 3070805 Other
5 1967 US 3070806 Other
6 1970 US 3070807 Drugs & Medical
7 1970 US 3070808 Drugs & Medical
8 1963 US 3070809 Other
9 1965 US 3070810 Other
10 1965 US 3070811 Other
11 1964 US 3070812 Other
12 1964 US 3070813 Other
13 1964 US 3070814 Mechanical
14 1964 US 3070815 Mechanical
15 1998 US 3070816 Mechanical
16 1998 US 3070817 Mechanical
17 1998 US 3070818 Other
18 1999 US 3070819 Other
I tried these codes, but it did not work. Please give me some advice!!
us = df1[(df1['COUNTRY'] == 'US')]
nonus = df1[(df1['COUNTRY'] != 'US')]
plt.plot(us['GYEAR'], us['PATENT'], linewidth='4', color ='k',label='US')
plt.plot(nonus['GYEAR'], nonus['PATENT'], linewidth='1', color ='b',label='Non-US')
Upvotes: 3
Views: 1388
Reputation: 862521
I think you need crosstab
for reshape with plot
:
pd.crosstab(df['GYEAR'], df['CAT']).plot()
df2 = pd.crosstab(df['GYEAR'], df['COUNTRY'])
df2['Total'] = df2.sum(axis=1)
df2.plot()
Alternative solution with aggregating size
and reshape by unstack
:
df.groupby(['GYEAR','CAT']).size().unstack(fill_value=0).plot()
df2 = df.groupby(['GYEAR','COUNTRY']).size().unstack(fill_value=0)
df2['Total'] = df2.sum(axis=1)
df2.plot()
Upvotes: 2