Reputation: 321
I have a question similar to this one but I need some further steps. The thing is my file contains like 50k+ lines. Each line have 4 values "Indicator","Country","Date" and "value". I want to split my CSV based on country. I do not know how many countries there is so all countries with similar name should be in one CSV file and so on. The CSV file in not order either. I am using pandas and here is my code so far:
import pandas as pd
def read_csvfile():
df = pd.read_csv('ebola_data_db_format.csv', sep= ',')
#remove the unneeded columns
df = df[df['Country'] != "Guinea 2"]
df = df[df['Country'] != "Liberia 2"]
#reset the index
df.reset_index(drop=True, inplace=True)
print (df.head(10))
read_csvfile()
I want to be able to have a CSV file for every country so I can plot their data separately. Help please!
Upvotes: 1
Views: 3134
Reputation: 34046
You can use dictionary comprehension
here:
dfs_dict = {j: df[df['Country'] == j] for j in df['Country'].unique()}
For saving country-wise
data in country
cdv's, do this:
for k, v in dfs_dict.items():
v.to_csv('{}.csv'.format(k))
Upvotes: 2
Reputation: 150735
You can use groupby:
country_dfs = {k:v for k,v in df.groupby('Country')}
To save them in several csv files:
for k, v in df.groupby('Country'):
v.to_csv(f'{k}.csv')
or from country_dfs
:
for k, v in country_dfs.items():
v.to_csv(f'{k}.csv')
Upvotes: 3