Reputation: 31
I need some help plotting these data points from this .csv file into a line graph.
I'm trying to plot the United States and Vietnam's # of health personnel over the years 2005, 2010, 2015, and 2016. So, the x values will be 2005, 2010, 2015, 2016, and the y values will be # of personnel.
Problem is, I don't know where to start! I'm not quite sure how to specify what points of data I want to extract when the .csv files are so huge. I'm usually used to .csv files containing very simple data:
0, 1
1, 1
2, 3
3, 5
4, 7
5, 8
The actual .csv file I'm trying to work with can be accessed here.
Any help will be appreciated!
Upvotes: 0
Views: 54
Reputation: 2421
url = "http://data.un.org/_Docs/SYB/CSV/SYB62_154_201906_Health%20Personnel.csv"
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv(url, skiprows=1)
# Some data tidying
data['Value'] = data['Value'].str.replace(',', "").astype(float)
data = data.rename(columns={'Unnamed: 1': "Country"})
def plot_country_occupation(country, occupation):
data[
~data['Series'].str.contains('(per 1000 population)')
& (data['Country'] == country)
& (data['Series'].str.contains(occupation))
].plot(x='Year', y='Value', title=" - ".join([country, occupation]))
With the function, you can specify the country and the occupation.
plot_country_occupation('Viet Nam', 'Physicians')
plot_country_occupation('United States of America', 'Pharmacists')
Upvotes: 1