badideas
badideas

Reputation: 3557

Is there a way to auto-adjust Excel column widths with pandas.ExcelWriter?

I am being asked to generate some Excel reports. I am currently using pandas quite heavily for my data, so naturally I would like to use the pandas.ExcelWriter method to generate these reports. However the fixed column widths are a problem.

The code I have so far is simple enough. Say I have a dataframe called df:

writer = pd.ExcelWriter(excel_file_path, engine='openpyxl')
df.to_excel(writer, sheet_name="Summary")

I was looking over the pandas docs, and I don't really see any options to set column widths. Is there a trick to make it such that the columns auto-adjust to the data? Or is there something I can do after the fact to the xlsx file to adjust the column widths?

(I am using the OpenPyXL library, and generating .xlsx files - if that makes any difference.)

Upvotes: 223

Views: 245221

Answers (24)

Angel
Angel

Reputation: 2875

Simple, fast and beatiful solution (saves as Excel native table)

import pandas as pd


def to_excel(fpath: str, df: pd.DataFrame):
    """Saves DataFrame as native Excel table autofitting columns"""
    writer = pd.ExcelWriter(fpath, engine="xlsxwriter")
    df.to_excel(writer, sheet_name="Sheet1", startrow=1, header=False, index=False)

    worksheet = writer.sheets["Sheet1"]
    (rows, cols) = df.shape
    column_settings = [{"header": column} for column in df.columns]

    worksheet.add_table(0, 0, rows, cols - 1, {"columns": column_settings})
    worksheet.set_column(0, cols - 1, 1)
    worksheet.autofit()

    writer.close()


data = [
    {"Name": "Angel", "email": "[email protected]", "Position": "Data Engineer"},
    {
        "Name": "Very Long Name on Purpose to check column autofit",
        "email": "[email protected]",
        "Position": "ML Engineer",
    },
]
df = pd.DataFrame(data)
to_excel("example.xlsx", df)

example code output

Necessary libraries:

pip install pandas
pip install xlsxwriter

Upvotes: 1

link89
link89

Reputation: 1803

I have implement a method based on this answer: https://stackoverflow.com/a/61617835/3099733, with a tiny improvement to take multiple line string into account.

def excel_autowidth(df, sheet):
    def width(s):
        s = s.strip()
        if not s:
            return 0
        return max([len(str(x)) for x in s.splitlines()])

    for idx, col in enumerate(df):
        series = df[col]
        max_len = max((series.astype(str).map(width).max(), len(str(series.name)))) + 1
        sheet.set_column(idx, idx, max_len)

And the usage is as the below:

with pd.ExcelWriter(out_excel, engine_kwargs={'options':{'strings_to_urls': False}}) as writer:
    writer.book.formats[0].set_text_wrap()  # type: ignore

    group_df.to_excel(writer, sheet_name='groups', index=False)
    candidate_df.to_excel(writer, sheet_name='candidates', index=False) 
                                                                                                                                                                                                                                                
    excel_autowidth(group_df, writer.sheets['groups'])
    excel_autowidth(candidate_df, writer.sheets['candidates'])  

Upvotes: 0

Yauheni Piatrouski
Yauheni Piatrouski

Reputation: 121

Please try to use worksheet.autofit() (requires pip install XlsxWriter>=3.0.6).

Full solution for pandas as per this comment:

# Although the xlsxwriter engine is the default it is good to be explicit
# so it is clear what breaks if another engine is installed
with pd.ExcelWriter('data.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer, sheet_name='summary')
    writer.sheets['summary'].autofit()

P.S. I'm newbie to writing answers, I apologize for the dryness of the answer.

Upvotes: 12

Giorgos Myrianthous
Giorgos Myrianthous

Reputation: 39840

Dynamically adjust all the column lengths

writer = pd.ExcelWriter('/path/to/output/file.xlsx') 
df.to_excel(writer, sheet_name='sheetName', index=False, na_rep='NaN')

for column in df:
    column_length = max(df[column].astype(str).map(len).max(), len(column))
    col_idx = df.columns.get_loc(column)
    writer.sheets['sheetName'].set_column(col_idx, col_idx, column_length)

writer.close()  # (writer.save() was deprecated and removed as of 2023/2024)

Manually adjust a column using Column Name

col_idx = df.columns.get_loc('columnName')
writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

Manually adjust a column using Column Index

writer.sheets['sheetName'].set_column(col_idx, col_idx, 15)

In case any of the above is failing with

AttributeError: 'Worksheet' object has no attribute 'set_column'

make sure to install xlsxwriter:

pip install xlsxwriter

and use it as the engine:

writer = pd.ExcelWriter('/path/to/output/file.xlsx', engine='xlsxwriter') 

For a more comprehensive explanation you can read the article How to Auto-Adjust the Width of Excel Columns with Pandas ExcelWriter on TDS.

Upvotes: 77

xjcl
xjcl

Reputation: 15309

My helper function is based on counting characters, which works out on average, but can of course look funky if your text is "iiiiiiii" (narrow) or "WWWWWWWW" (wide). It also assumes floats will usually be formatted with 2 decimals.

It automatically checks the column header and the column contents for the widest string which is longer than the Excel column width default of 8.43 characters. Pandas exports column headers and index columns in bold, which is accounted for by multiplying their width with 1.2.

def auto_width(writer, sheet, df):
    """Automatically calculate and set column widths of DataFrame written into Excel"""

    def get_width(x, is_bold):
        if isinstance(x, float):
            x = f"{x:.2f}"  # will usually appear with 2 decimals anyway
        return len(str(x)) * (1.2 if is_bold else 1) + 2

    df2 = df.reset_index() if any(df.index.names) else df
    for col_idx, col in enumerate(df2):  # assume DF starts at the left-most column
        # max of (width of each entry in this column, 8.43, column header/title width)
        col_width = max(
            *df2[col].apply(get_width, args=(col in df.index.names,)),
            8.43,
            get_width(col, True),
        )
        writer.sheets[sheet].set_column(col_idx, col_idx, col_width)

Usage:

with pd.ExcelWriter(..., engine="xlsxwriter") as writer:
    df.to_excel(writer, sheet_name="My Sheet")
    auto_width(writer, "My Sheet", df)

Upvotes: 0

Wagner Montanini
Wagner Montanini

Reputation: 21

The function 'autosize_to_excel' takes a DataFrame and saves it to an Excel file, adjusting the column widths to fit the data.

import pandas as pd
from openpyxl.utils import get_column_letter
def autosize_to_excel(self,filename: str, df: pd.DataFrame, sheet_name: str = 'sheetName',**kwargs):
    """
    The function 'autosize_to_excel' takes a DataFrame and saves it to an Excel file, adjusting the
    column widths to fit the data.
    
    :param filename: The filename parameter is a string that specifies the name of the Excel file
    that will be created or overwritten with the data from the DataFrame
    :type filename: str
    :param df: The parameter `df` is a pandas DataFrame that contains the data you want to write to
    the Excel file
    :type df: pd.DataFrame
    :param sheet_name: The parameter "sheet_name" is used to specify the name of the sheet in the
    Excel file where the DataFrame will be written. By default, it is set to 'sheetName', defaults
    to sheetName
    :type sheet_name: str (optional)
    """
    try:
        with pd.ExcelWriter(filename) as writer:
            df.to_excel(writer, sheet_name=sheet_name,**kwargs)

            for column in df:
                column_length = max(df[column].astype(str).map(len).max(), len(column))
                column_letter = get_column_letter(df.columns.get_loc(column) + 1) # Obtenha a letra da coluna
                writer.sheets[sheet_name].column_dimensions[column_letter].width = column_length + 2
    except Exception as err:
        print(err)

Upvotes: 2

Mounesh
Mounesh

Reputation: 744

Since you are using openpyxl engine, you can try this below solution. The columns get adjusted automatically

for column_cells in sheet.columns:
    new_column_length = max(len(str(cell.value)) for cell in column_cells)
    new_column_letter = (get_column_letter(column_cells[0].column))
    if new_column_length > 0:
        sheet.column_dimensions[new_column_letter].width = new_column_length*1.23

Upvotes: 1

papadulo
papadulo

Reputation: 13

attention, in previuos answers a lot of outdated methods and hard crutches.

FutureWarning: save is not part of the public API, usage can give unexpected results and will be removed in a future version
  writer.save()

i found an easy solution for this old, but important problem:

from UliPlot.XLSX import auto_adjust_xlsx_column_width

with pd.ExcelWriter("example.xlsx") as writer:
    df.to_excel(writer, sheet_name="MySheet")
    auto_adjust_xlsx_column_width(df, writer, sheet_name="MySheet", margin=1)

THAT IS ALL YOU NEED

the decision is not mine, so I have to post a link to the author, you can thank him

Upvotes: 0

Jay Jara
Jay Jara

Reputation: 99

A lot of valid solutions on here. I think the easiest and cleanest way to achieve is using Microsoft's pywin32 package which closely mirrors Excel VBA. The Range.AutoFit method takes care of this issue. Example below:

import win32com.client as win32

xlApp = win32.Dispatch('Excel.Application')
wb = xlApp.Workbooks.Open(***file path to excel file goes here***)

ws = wb.Worksheets[***name of worksheet trying adjust column width***]
ws.Columns.AutoFit()

Note: Worksheet.Columns property represents a Range object. Autofit is a method that belongs to the Range object.

Upvotes: 1

mouwsy
mouwsy

Reputation: 1933

Yes, there is there is something you can do subsequently to the xlsx file to auto-adjust the column widths. Use xlwings to autofit columns. It's a pretty simple solution, see the 6 last lines of the example code. The advantage of this procedure is that you don't have to worry about font size, font type or anything else. Requirement: Excel installation.

import pandas as pd
import xlwings as xw

path = r"test.xlsx"

# Export your dataframe in question.
df = pd._testing.makeDataFrame()
df.to_excel(path)

# Autofit all columns with xlwings.
with xw.App(visible=False) as app:
    wb = xw.Book(path)

    for ws in wb.sheets:
        ws.autofit(axis="columns")

    wb.save(path)
    wb.close()

Upvotes: 5

Crispy13
Crispy13

Reputation: 369

An openpyxl version based on @alichaudry's code.
The code 1) loads an excel file, 2) adjusts column widths and 3) saves it.

def auto_adjust_column_widths(excel_file : "Excel File Path", extra_space = 1) -> None:
    """
    Adjusts column widths of the excel file and replaces it with the adjusted one.
    Adjusting columns is based on the lengths of columns values (including column names).
    Parameters
    ----------
    excel_file :
        excel_file to adjust column widths.
    
    extra_space : 
        extra column width in addition to the value-based-widths
    """

    from openpyxl import load_workbook
    from openpyxl.utils import get_column_letter


    wb = load_workbook(excel_file)

    
    for ws in wb:
        df = pd.DataFrame(ws.values,)

        
        for i,r in (df.astype(str).applymap(len).max(axis=0) + extra_space).iteritems():
            ws.column_dimensions[get_column_letter(i+1)].width = r

    
    wb.save(excel_file)

Upvotes: 0

Imperial_J
Imperial_J

Reputation: 376

I may be a bit late to the party but this code works when using 'openpyxl' as your engine, sometimes pip install xlsxwriter wont solve the issue. This code below works like a charm. Edit any part as you wish.

def text_length(text):
    """
    Get the effective text length in characters, taking into account newlines
    """
    if not text:
        return 0
    lines = text.split("\n")
    return max(len(line) for line in lines)

def _to_str_for_length(v, decimals=3):
    """
    Like str() but rounds decimals to predefined length
    """
    if isinstance(v, float):
        # Round to [decimal] places
        return str(Decimal(v).quantize(Decimal('1.' + '0' * decimals)).normalize())
    else:
        return str(v)


def auto_adjust_xlsx_column_width(df, writer, sheet_name, margin=3, length_factor=1.0, decimals=3, index=False):

    sheet = writer.sheets[sheet_name]
    _to_str = functools.partial(_to_str_for_length, decimals=decimals)
    # Compute & set column width for each column
    for column_name in df.columns:
        # Convert the value of the columns to string and select the 
        column_length =  max(df[column_name].apply(_to_str).map(text_length).max(), text_length(column_name)) + 5
        # Get index of column in XLSX
        # Column index is +1 if we also export the index column
        col_idx = df.columns.get_loc(column_name)
        if index:
            col_idx += 1
        # Set width of column to (column_length + margin)
        sheet.column_dimensions[openpyxl.utils.cell.get_column_letter(col_idx + 1)].width = column_length * length_factor + margin
    # Compute column width of index column (if enabled)
    if index: # If the index column is being exported
        index_length =  max(df.index.map(_to_str).map(text_length).max(), text_length(df.index.name))
        sheet.column_dimensions["A"].width = index_length * length_factor + margin

Upvotes: 0

gijsvandinther
gijsvandinther

Reputation: 11

This function works for me, also fixes the index width

def write_to_excel(writer, X, sheet_name, sep_only=False):
    #writer=writer object
    #X=dataframe
    #sheet_name=name of sheet
    #sep_only=True:write only as separate excel file, False: write as sheet to the writer object
    if sheet_name=="": 
        print("specify sheet_name!")
    else:
        X.to_excel(f"{output_folder}{prefix_excel_save}_{sheet_name}.xlsx")
        if not sep_only: 
            X.to_excel(writer, sheet_name=sheet_name)
            
            #fix column widths
            worksheet = writer.sheets[sheet_name]  # pull worksheet object
            for idx, col in enumerate(X.columns):  # loop through all columns
                series = X[col]
                max_len = max((
                    series.astype(str).map(len).max(),  # len of largest item
                    len(str(series.name))  # len of column name/header
                    )) + 1  # adding a little extra space
                worksheet.set_column(idx+1, idx+1, max_len)  # set column width (=1 because index = 1)
                
            #fix index width
            max_len=pd.Series(X.index.values).astype(str).map(len).max()+1
            worksheet.set_column(0, 0, max_len)
            
        if sep_only: 
            print(f'{sheet_name} is written as seperate file')
        else:
            print(f'{sheet_name} is written as seperate file')
            print(f'{sheet_name} is written as sheet')
    return writer

call example:

writer = write_to_excel(writer, dataframe, "Statistical_Analysis")

Upvotes: 1

Michel Kluger
Michel Kluger

Reputation: 173

you can solve the problem by calling the following function, where df is the dataframe you want to get the sizes and the sheetname is the sheet in excel where you want the modifications to take place

def auto_width_columns(df, sheetname):
        workbook = writer.book  
        worksheet= writer.sheets[sheetname] 
    
        for i, col in enumerate(df.columns):
            column_len = max(df[col].astype(str).str.len().max(), len(col) + 2)
            worksheet.set_column(i, i, column_len)

Upvotes: 4

rafat.ch
rafat.ch

Reputation: 113

At work, I am always writing the dataframes to excel files. So instead of writing the same code over and over, I have created a modulus. Now I just import it and use it to write and formate the excel files. There is one downside though, it takes a long time if the dataframe is extra large. So here is the code:

def result_to_excel(output_name, dataframes_list, sheet_names_list, output_dir):
    out_path = os.path.join(output_dir, output_name)
    writerReport = pd.ExcelWriter(out_path, engine='xlsxwriter',
                    datetime_format='yyyymmdd', date_format='yyyymmdd')
    workbook = writerReport.book
    # loop through the list of dataframes to save every dataframe into a new sheet in the excel file
    for i, dataframe in enumerate(dataframes_list):
        sheet_name = sheet_names_list[i]  # choose the sheet name from sheet_names_list
        dataframe.to_excel(writerReport, sheet_name=sheet_name, index=False, startrow=0)
        # Add a header format.
        format = workbook.add_format({
            'bold': True,
            'border': 1,
            'fg_color': '#0000FF',
            'font_color': 'white'})
        # Write the column headers with the defined format.
        worksheet = writerReport.sheets[sheet_name]
        for col_num, col_name in enumerate(dataframe.columns.values):
            worksheet.write(0, col_num, col_name, format)
        worksheet.autofilter(0, 0, 0, len(dataframe.columns) - 1)
        worksheet.freeze_panes(1, 0)
        # loop through the columns in the dataframe to get the width of the column
        for j, col in enumerate(dataframe.columns):
            max_width = max([len(str(s)) for s in dataframe[col].values] + [len(col) + 2])
            # define a max width to not get to wide column
            if max_width > 50:
                max_width = 50
            worksheet.set_column(j, j, max_width)
    writerReport.save()
    return output_dir + output_name

Upvotes: 6

AsafSH
AsafSH

Reputation: 695

There is a nice package that I started to use recently called StyleFrame.

it gets DataFrame and lets you to style it very easily...

by default the columns width is auto-adjusting.

for example:

from StyleFrame import StyleFrame
import pandas as pd

df = pd.DataFrame({'aaaaaaaaaaa': [1, 2, 3], 
                   'bbbbbbbbb': [1, 1, 1],
                   'ccccccccccc': [2, 3, 4]})
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(excel_writer=excel_writer, row_to_add_filters=0,
            columns_and_rows_to_freeze='B2')
excel_writer.save()

you can also change the columns width:

sf.set_column_width(columns=['aaaaaaaaaaa', 'bbbbbbbbb'],
                    width=35.3)

UPDATE 1

In version 1.4 best_fit argument was added to StyleFrame.to_excel. See the documentation.

UPDATE 2

Here's a sample of code that works for StyleFrame 3.x.x

from styleframe import StyleFrame
import pandas as pd

columns = ['aaaaaaaaaaa', 'bbbbbbbbb', 'ccccccccccc', ]
df = pd.DataFrame(data={
        'aaaaaaaaaaa': [1, 2, 3, ],
        'bbbbbbbbb': [1, 1, 1, ],
        'ccccccccccc': [2, 3, 4, ],
    }, columns=columns,
)
excel_writer = StyleFrame.ExcelWriter('example.xlsx')
sf = StyleFrame(df)
sf.to_excel(
    excel_writer=excel_writer, 
    best_fit=columns,
    columns_and_rows_to_freeze='B2', 
    row_to_add_filters=0,
)
excel_writer.save()

Upvotes: 39

jack1536
jack1536

Reputation: 181

I found that it was more useful to adjust the column with based on the column header rather than column content.

Using df.columns.values.tolist() I generate a list of the column headers and use the lengths of these headers to determine the width of the columns.

See full code below:

import pandas as pd
import xlsxwriter

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
df.to_excel(writer, index=False, sheet_name=sheetname)

workbook = writer.book # Access the workbook
worksheet= writer.sheets[sheetname] # Access the Worksheet

header_list = df.columns.values.tolist() # Generate list of headers
for i in range(0, len(header_list)):
    worksheet.set_column(i, i, len(header_list[i])) # Set column widths based on len(header)

writer.save() # Save the excel file

Upvotes: 8

Ashu007
Ashu007

Reputation: 795

By using pandas and xlsxwriter you can do your task, below code will perfectly work in Python 3.x. For more details on working with XlsxWriter with pandas this link might be useful https://xlsxwriter.readthedocs.io/working_with_pandas.html

import pandas as pd
writer = pd.ExcelWriter(excel_file_path, engine='xlsxwriter')
df.to_excel(writer, sheet_name="Summary")
workbook = writer.book
worksheet = writer.sheets["Summary"]
#set the column width as per your requirement
worksheet.set_column('A:A', 25)
writer.save()

Upvotes: 18

Ssubrat Rrudra
Ssubrat Rrudra

Reputation: 960

import re
import openpyxl
..
for col in _ws.columns:
    max_lenght = 0
    print(col[0])
    col_name = re.findall('\w\d', str(col[0]))
    col_name = col_name[0]
    col_name = re.findall('\w', str(col_name))[0]
    print(col_name)
    for cell in col:
        try:
            if len(str(cell.value)) > max_lenght:
                max_lenght = len(cell.value)
        except:
            pass
    adjusted_width = (max_lenght+2)
    _ws.column_dimensions[col_name].width = adjusted_width

Upvotes: 3

kgibm
kgibm

Reputation: 1038

Combining the other answers and comments and also supporting multi-indices:

def autosize_excel_columns(worksheet, df):
  autosize_excel_columns_df(worksheet, df.index.to_frame())
  autosize_excel_columns_df(worksheet, df, offset=df.index.nlevels)

def autosize_excel_columns_df(worksheet, df, offset=0):
  for idx, col in enumerate(df):
    series = df[col]
    max_len = max((
      series.astype(str).map(len).max(),
      len(str(series.name))
    )) + 1
    worksheet.set_column(idx+offset, idx+offset, max_len)

sheetname=...
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
worksheet = writer.sheets[sheetname]
autosize_excel_columns(worksheet, df)
writer.save()

Upvotes: 6

alichaudry
alichaudry

Reputation: 2743

Inspired by user6178746's answer, I have the following:

# Given a dict of dataframes, for example:
# dfs = {'gadgets': df_gadgets, 'widgets': df_widgets}

writer = pd.ExcelWriter(filename, engine='xlsxwriter')
for sheetname, df in dfs.items():  # loop through `dict` of dataframes
    df.to_excel(writer, sheet_name=sheetname)  # send df to writer
    worksheet = writer.sheets[sheetname]  # pull worksheet object
    for idx, col in enumerate(df):  # loop through all columns
        series = df[col]
        max_len = max((
            series.astype(str).map(len).max(),  # len of largest item
            len(str(series.name))  # len of column name/header
            )) + 1  # adding a little extra space
        worksheet.set_column(idx, idx, max_len)  # set column width
writer.save()

Upvotes: 132

Ashish Jith
Ashish Jith

Reputation: 71

Easiest solution is to specify width of column in set_column method.

    for worksheet in writer.sheets.values():
        worksheet.set_column(0,last_column_value, required_width_constant)

Upvotes: 1

user6178746
user6178746

Reputation:

I'm posting this because I just ran into the same issue and found that the official documentation for Xlsxwriter and pandas still have this functionality listed as unsupported. I hacked together a solution that solved the issue i was having. I basically just iterate through each column and use worksheet.set_column to set the column width == the max length of the contents of that column.

One important note, however. This solution does not fit the column headers, simply the column values. That should be an easy change though if you need to fit the headers instead. Hope this helps someone :)

import pandas as pd
import sqlalchemy as sa
import urllib


read_server = 'serverName'
read_database = 'databaseName'

read_params = urllib.quote_plus("DRIVER={SQL Server};SERVER="+read_server+";DATABASE="+read_database+";TRUSTED_CONNECTION=Yes")
read_engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % read_params)

#Output some SQL Server data into a dataframe
my_sql_query = """ SELECT * FROM dbo.my_table """
my_dataframe = pd.read_sql_query(my_sql_query,con=read_engine)

#Set destination directory to save excel.
xlsFilepath = r'H:\my_project' + "\\" + 'my_file_name.xlsx'
writer = pd.ExcelWriter(xlsFilepath, engine='xlsxwriter')

#Write excel to file using pandas to_excel
my_dataframe.to_excel(writer, startrow = 1, sheet_name='Sheet1', index=False)

#Indicate workbook and worksheet for formatting
workbook = writer.book
worksheet = writer.sheets['Sheet1']

#Iterate through each column and set the width == the max length in that column. A padding length of 2 is also added.
for i, col in enumerate(my_dataframe.columns):
    # find length of column i
    column_len = my_dataframe[col].astype(str).str.len().max()
    # Setting the length if the column header is larger
    # than the max column value length
    column_len = max(column_len, len(col)) + 2
    # set the column length
    worksheet.set_column(i, i, column_len)
writer.save()

Upvotes: 37

ojdo
ojdo

Reputation: 8900

There is probably no automatic way to do it right now, but as you use openpyxl, the following line (adapted from another answer by user Bufke on how to do in manually) allows you to specify a sane value (in character widths):

writer.sheets['Summary'].column_dimensions['A'].width = 15

Upvotes: 26

Related Questions