Reputation: 5
I'm writing a script that schould be able to sort a large amount od data from excel and make some statistics, and need help with the sorting part...
I have a large excel with multiple sheets, each with list of products and their properties and need to sort data so each product is in one sheet. That I can do. However some products have different names, although they are the same, and I need them all to be in the same sheet for the correct statistic.
Based of the code example below, I have products named text1, text2, text3, ... ,text7. The duplicities are text2 = text3, text5 = text6.
What I already have are sheets with sorted data for
text1, text2, text3, text4, text5, text6, text7
named
'text1', 'text2', 'text3', 'text4', 'text5', 'text6', 'text7'
What I need are sheets with data for
text1, text2+text3, text4, text5+text6, text7
named
'text1', 'text2', 'text4', 'text5', 'text7'
I'm sorry for bad explaining, hope it makes sense.
I made even example of source-data.xls, and uploaded it here:
https://www.dropbox.com/sh/aiqysx3gyxeuot9/AAAV6mqvvbw5TUIBvzuKCigka?dl=0
Is it even possible, or schould I rather change the way of thinking about the problem?
texts_to_find = ['text1', 'text2', 'text4', 'text5', 'text7']
sheets = ['a', 'b', 'c', 'd']
file = 'source-data.xls'
df = []
for sheetName in sheets:
data = pd.read_excel(file, sheet_name ='{name}'.format(name=sheetName), usecols='B:P', skiprows=1)
df.append(data)
df = pd.concat(df)
file_out_selected = 'selected-data.xlsx'
with pd.ExcelWriter(file_out_selected) as writer:
for text in texts_to_find:
df2 = df[df['column-name'] == text]
df2.to_excel(writer, header=True, index=False, sheet_name ='{name}'.format(name=text))
Upvotes: 0
Views: 66
Reputation: 149025
You must tell Python that more than one name has to go in the same sheet. A simple way is to setup a relation 1-N (list of lists) sheet_name -> column_names.
Code could become:
texts_to_find = (('text1', ['text1']),
('text2', ['text2', 'text3']),
('text4', ['text4']),
('text5', ['text5', 'text6']),
('text7', ['text7']))
...
file_out_selected = 'selected-data.xlsx'
with pd.ExcelWriter(file_out_selected) as writer:
for text, texts in texts_to_find:
df2 = df[df['column-name'].isin(texts)]
df2.to_excel(writer, header=True, index=False, sheet_name=text)
Upvotes: 0