Mohammed
Mohammed

Reputation: 321

Pandas split data frames into multiple csv's based on value from column

I have a question similar to this one but I need some further steps. The thing is my file contains like 50k+ lines. Each line have 4 values "Indicator","Country","Date" and "value". I want to split my CSV based on country. I do not know how many countries there is so all countries with similar name should be in one CSV file and so on. The CSV file in not order either. I am using pandas and here is my code so far:

import pandas as pd
def read_csvfile():
    df = pd.read_csv('ebola_data_db_format.csv', sep= ',')
    
    #remove the unneeded columns
    df = df[df['Country'] != "Guinea 2"]  
    df = df[df['Country'] != "Liberia 2"]
   
    #reset the index
    df.reset_index(drop=True, inplace=True)
    print (df.head(10))

read_csvfile()

I want to be able to have a CSV file for every country so I can plot their data separately. Help please!

Upvotes: 1

Views: 3134

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can use dictionary comprehension here:

dfs_dict = {j: df[df['Country'] == j] for j in df['Country'].unique()}

For saving country-wise data in country cdv's, do this:

for k, v in dfs_dict.items():
    v.to_csv('{}.csv'.format(k))

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

You can use groupby:

country_dfs = {k:v for k,v in df.groupby('Country')}

To save them in several csv files:

for k, v in df.groupby('Country'):
    v.to_csv(f'{k}.csv')

or from country_dfs:

for k, v in country_dfs.items():
    v.to_csv(f'{k}.csv')

Upvotes: 3

Related Questions