user6105633
user6105633

Reputation:

Split dataframe according to column value and export to different Excel worksheets

Source used before asked:

Pandas: Iterate through a list of DataFrames and export each to excel sheets

Splitting dataframe into multiple dataframes

I have managed to do all of this:

# sort the dataframe
df.sort(columns=['name'], inplace=True)
# set the index to be this and don't drop
df.set_index(keys=['name'], drop=False,inplace=True)
# get a list of names
names=df['name'].unique().tolist()
# now we can perform a lookup on a 'view' of the dataframe
joe = df.loc[df.name=='joe']
# now you can query all 'joes'

I have managed to make this work - joe = df.loc[df.name=='joe'] and it gave the exact result what I was looking for.

As solution to make it work for big amount of data I found this potential solution.

writer = pandas.ExcelWriter("MyData.xlsx", engine='xlsxwriter')
List = [Data , ByBrand]
for i in List:
        i.to_excel(writer, sheet_name= i)
writer.save()

Currently I have:

teacher_names = ['Teacher A', 'Teacher B', 'Teacher C']

df =

              ID   Teacher_name      Student_name
Teacher_name                                                         
Teacher A     1.0  Teacher A         Student 1 
Teacher A     NaN  Teacher A         Student 2  
Teacher B     0.0  Teacher B         Student 3 
Teacher C     2.0  Teacher C         Student 4 

If I use - test = df.loc[df.Teacher_name=='Teacher A'] - Will receive exact result.

Question: How to optimise that it will automatically save the "test" result to (for each teacher separate) excel file ( .to_excel(writer, sheet_name=Teacher_name ) with teacher name, and will do it for all the existing in the database teacher?

Upvotes: 2

Views: 8628

Answers (2)

jpp
jpp

Reputation: 164623

This should work for you. You were nearly there, you just need to iterate the names list and filter your dataframe each time.

names = df['name'].unique().tolist()

writer = pandas.ExcelWriter("MyData.xlsx", engine='xlsxwriter')

for myname in names:
    mydf = df.loc[df.name==myname]
    mydf.to_excel(writer, sheet_name=myname)
    
writer.close()

EDIT: After 1.5 version of Pandas you just need to use close() instead of save().

Upvotes: 5

Devarshi Mandal
Devarshi Mandal

Reputation: 733

@jpp, the text 'sheetname' is to be replaced with 'sheet_name'. Also once the 'name' variable is converted to list, upon running the for loop for creating multiple sheets based on unique name value, I get the following error:

InvalidWorksheetName: Invalid Excel character '[]:*?/\' in sheetname '['.

Alternate way of creating multiple worksheets (in a single excel file) based on a column value (through a function):

def writesheet(g):
    a=g['name'].tolist()[0]
    g.to_excel(writer,sheet_name = str(a),index=False)

df.groupby('name').apply(writesheet)
writer.save()

Source: How to split a large excel file into multiple worksheets based on their given ip address using pandas python

Upvotes: 0

Related Questions