Sorting data in Pandas to sheets with for cyclus - multiple data in one sheet

Question

I'm writing a script that schould be able to sort a large amount od data from excel and make some statistics, and need help with the sorting part...
I have a large excel with multiple sheets, each with list of products and their properties and need to sort data so each product is in one sheet. That I can do. However some products have different names, although they are the same, and I need them all to be in the same sheet for the correct statistic.

Based of the code example below, I have products named text1, text2, text3, ... ,text7. The duplicities are text2 = text3, text5 = text6.

What I already have are sheets with sorted data for
text1, text2, text3, text4, text5, text6, text7
named
'text1', 'text2', 'text3', 'text4', 'text5', 'text6', 'text7'

What I need are sheets with data for
text1, text2+text3, text4, text5+text6, text7
named
'text1', 'text2', 'text4', 'text5', 'text7'

I'm sorry for bad explaining, hope it makes sense.

I made even example of source-data.xls, and uploaded it here: https://www.dropbox.com/sh/aiqysx3gyxeuot9/AAAV6mqvvbw5TUIBvzuKCigka?dl=0

Is it even possible, or schould I rather change the way of thinking about the problem?

texts_to_find = ['text1', 'text2', 'text4', 'text5', 'text7']
sheets = ['a', 'b', 'c', 'd']

file = 'source-data.xls'
df = []
for sheetName in sheets:
    data = pd.read_excel(file, sheet_name ='{name}'.format(name=sheetName), usecols='B:P', skiprows=1)
    df.append(data)
df = pd.concat(df)

file_out_selected = 'selected-data.xlsx'
with pd.ExcelWriter(file_out_selected) as writer:
    for text in texts_to_find:
        df2 = df[df['column-name'] == text]
        df2.to_excel(writer, header=True, index=False, sheet_name ='{name}'.format(name=text))

Serge Ballesta · Accepted Answer

You must tell Python that more than one name has to go in the same sheet. A simple way is to setup a relation 1-N (list of lists) sheet_name -> column_names.

Code could become:

texts_to_find = (('text1', ['text1']),
                 ('text2', ['text2', 'text3']),
                 ('text4', ['text4']),
                 ('text5', ['text5', 'text6']),
                 ('text7', ['text7']))

...

file_out_selected = 'selected-data.xlsx'
with pd.ExcelWriter(file_out_selected) as writer:
    for text, texts in texts_to_find:
        df2 = df[df['column-name'].isin(texts)]
        df2.to_excel(writer, header=True, index=False, sheet_name=text)

Sorting data in Pandas to sheets with for cyclus - multiple data in one sheet

Answers (1)

Related Questions