Reputation: 1843
I have a dataframe Df
that looks like this:
Country Year
0 Australia, USA 2015
1 USA, Hong Kong, UK 1982
2 USA 2012
3 USA 1994
4 USA, France 2013
5 Japan 1988
6 Japan 1997
7 USA 2013
8 Mexico 2000
9 USA, UK 2005
10 USA 2012
11 USA, UK 2014
12 USA 1980
13 USA 1992
14 USA 1997
15 USA 2003
16 USA 2004
17 USA 2007
18 USA, Germany 2009
19 Japan 2006
20 Japan 1995
I want to make a bar chart for the Country
column, if i try this
Df.Country.value_counts().plot(kind='bar')
I get this plot
which is incorrect because it doesn't separate the countries. My goal is to obtain a bar chart that plots the count of each country in the column, but to achieve that, first i have to somehow split the string in each row (if needed) and then plot the data. I know i can use Df.Country.str.split(', ')
to split the strings, but if i do this i can't plot the data.
Anyone has an idea how to solve this problem?
Upvotes: 2
Views: 2210
Reputation: 879869
You could use the vectorized Series.str.split method to split the Country
s:
In [163]: df['Country'].str.split(r',\s+', expand=True)
Out[163]:
0 1 2
0 Australia USA None
1 USA Hong Kong UK
2 USA None None
3 USA None None
4 USA France None
...
If you stack this DataFrame to move all the values into a single column, then you can apply value_counts
and plot as before:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{'Country': ['Australia, USA', 'USA, Hong Kong, UK', 'USA', 'USA', 'USA, France', 'Japan', 'Japan', 'USA', 'Mexico', 'USA, UK', 'USA', 'USA, UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA, Germany', 'Japan', 'Japan'],
'Year': [2015, 1982, 2012, 1994, 2013, 1988, 1997, 2013, 2000, 2005, 2012, 2014, 1980, 1992, 1997, 2003, 2004, 2007, 2009, 2006, 1995]})
counts = df['Country'].str.split(r',\s+', expand=True).stack().value_counts()
counts.plot(kind='bar')
plt.show()
Upvotes: 6
Reputation:
new_df = pd.concat([Series(row['Year'], row['Country'].split(',')) for _, row in DF.iterrows()]).reset_index()
(DF is your old DF). this will give you one data point for each country name.
Hope this helps.
Cheers!
Upvotes: 1
Reputation: 109546
from collections import Counter
c = pd.Series(Counter(df.Country.str.split(',').sum()))
>>> c.plot(kind='bar', title='Country Count')
Upvotes: 2