GoingMyWay
GoingMyWay

Reputation: 17468

Split dataframe into Excel files by column value

I have a DataFrame like this

>>> df
    id    name    score    subject
    0001   'bob'    100    'math'
    0001   'bob'     67    'science'
    0001   'bob'     63    'bio'
    0002  'jack'     67    'math'
    0002  'jack'     98    'science' 
    0002  'jack'     90    'bio'
    0003  'jack'     60    'math'
    0003  'jack'     78    'science' 
    0003  'rose'     87    'bio'

I want to filter every id's data into a new DataFrame and write to an Excel file based on its id. So, the above df will be filtered into 3 DataFrames whose ids are 0001, 0002 and 0003, and all the DataFrames will be written to individual excel files.

Upvotes: 8

Views: 31962

Answers (5)

cottontail
cottontail

Reputation: 23131

Pandas' groupby splits a dataframe according to a grouper. So simply groupby id and write each split df to an excel file.

for Id, d in df.groupby('id'):
    d.to_excel(f'Id_{Id}.xlsx')

This creates 3 excel files named "Id_1.xlsx", "Id_2.xlsx" and "Id_3.xlsx".

Similarly, one can create a dictionary of dfs (where the keys are ids)

dict([*df.groupby('id')])

or a list of dfs

[d for _, d in df.groupby('id')]

With OP's data, it looks like

enter image description here

Upvotes: 0

Pepe Alvarez
Pepe Alvarez

Reputation: 1624

First we have the data

data= { 
     "id":
     ['0001','0001', '0001', '0002', '0002', '0002', '0003', '0003', '0003'],
     "name":
     ['bob','bob','bob','jack','jack', 'jack', 'jack', 'jack',  'rose'],
     "score":
     [100,67,63,67,98,90,60,78,87],
     "subject":
     ['math','science','bio','math','science' ,'bio','math','science' ,'bio' ]
}

Then we process it

table = pd.DataFrame(data)
ids = table['id'].unique() # we get an array of unique ids
for i in ids:
    # You can refer to variables in the environment
    # by prefixing them with an '@' character
    idtable = table.query(" id == @i ") # we filter the dataframe by id
    title = f"table_id_{i}.csv" # we set a unique name for the csv file
    idtable.to_csv(title) # we save the filtered dataframe

Upvotes: 0

A2Ben415
A2Ben415

Reputation: 1

Needed to convert df row to (str) first, otherwise kept getting dtype errors.

df['sample']=df['sample'].apply(str) 

Upvotes: 0

Tasos
Tasos

Reputation: 7577

First, get a list of the unique ID values

uniquevalues = np.unique(df[['id']].values)

Then iterate on it and export each dataframe with those IDs in a CSV file

for id in uniquevalues:
    newdf = df[df['id'] == id]
    newdf.to_csv("dataframe "+id+".csv", sep='\t')

If you have only those three IDs, then you can just pass the for and do the same thing manually like

newdf = df[df['id'] == "0001"]
newdf.to_csv("dataframe0001.csv", sep='\t')

Upvotes: 12

Fabio Lamanna
Fabio Lamanna

Reputation: 21552

IIUC, on your example you can just filter the dataframe by id with:

df1 = df[df['id'] == 0001]

and the same for other id values.

Upvotes: 8

Related Questions