ah bon
ah bon

Reputation: 10021

Split dataframe by multiple columns into multiple excels

Consider the following dataframe df:

df = pd.DataFrame(dict(
        Id = [1, 2, 3, 3],
        Country=['jp', 'cn', 'uk', 'uk'],
        Sales = [5, 3, 3, 4]
    ))

print(df)
   Id Country  Sales
0   1      jp      5
1   2      cn      3
2   3      uk      3
3   3      uk      4

I want to iterate and groupby columns Id and Countryand write to excel:

for n, g in df.groupby(['Id', 'Country']):
    print(n)
    # `n` is the group name, which will be the country
    g.to_excel('{}.xlsx'.format(n), index = False)

It will create files: ["(1, 'jp').xlsx", "(2, 'cn').xlsx", "(3, 'uk').xlsx"], but I want the format like: ["1-jp.xlsx", "2-cn.xlsx", "3-uk.xlsx"].

How can I get it in Pandas? Thanks.

Upvotes: 1

Views: 30

Answers (2)

Tom Ron
Tom Ron

Reputation: 6181

for n, g in df.groupby(['Id', 'Country']):
    print(n)
    g.to_excel('{}.xlsx'.format("_".join([str(x) for x in n]), index = False)

Upvotes: 1

jezrael
jezrael

Reputation: 862661

You can unpack tuple to variables a and b and pass to format function:

for (a, b), g in df.groupby(['Id', 'Country']):
    print(a, b)
    # `n` is the group name, which will be the country
    g.to_excel('{}_{}.xlsx'.format(a, b), index = False)

Or select tuples by position by indexing:

for n, g in df.groupby(['Id', 'Country']):
    print(n)
    # `n` is the group name, which will be the country
    g.to_excel(f'{n[0]}_{n[1]}.xlsx'.format(n), index = False)

Upvotes: 1

Related Questions