Reputation: 911
When you print a pandas DataFrame, which calls DataFrame.to_string, it normally inserts a minimum of 2 spaces between the columns. For example, this code
import pandas as pd
df = pd.DataFrame( {
"c1" : ("a", "bb", "ccc", "dddd", "eeeeee"),
"c2" : (11, 22, 33, 44, 55),
"a3235235235": [1, 2, 3, 4, 5]
} )
print(df)
outputs
c1 c2 a3235235235
0 a 11 1
1 bb 22 2
2 ccc 33 3
3 dddd 44 4
4 eeeeee 55 5
which has a minimum of 2 spaces between each column.
I am copying DataFarames printed on the console and pasting it into documents, and I have received feedback that it is hard to read: people would like more spaces between the columns.
Is there a standard way to do that?
I see no option in either DataFrame.to_string or pandas.set_option.
I have done a web search, and not found an answer. This question asks how to remove those 2 spaces, while this question asks why sometimes only 1 space is between columns instead of 2 (I also have seen this bug, hope someone answers that question).
My hack solution is to define a function that converts a DataFrame's columns to type str, and then prepends each element with a string of the specified number of spaces.
This code (added to the code above)
def prependSpacesToColumns(df: pd.DataFrame, n: int = 3):
spaces = ' ' * n
# ensure every column name has the leading spaces:
if isinstance(df.columns, pd.MultiIndex):
for i in range(df.columns.nlevels):
levelNew = [spaces + str(s) for s in df.columns.levels[i]]
df.columns.set_levels(levelNew, level = i, inplace = True)
else:
df.columns = spaces + df.columns
# ensure every element has the leading spaces:
df = df.astype(str)
df = spaces + df
return df
dfSp = prependSpacesToColumns(df, 3)
print(dfSp)
outputs
c1 c2 a3235235235
0 a 11 1
1 bb 22 2
2 ccc 33 3
3 dddd 44 4
4 eeeeee 55 5
which is the desired effect.
But I think that pandas surely must have some builtin simple standard way to do this. Did I miss how?
Also, the solution needs to handle a DataFrame whose columns are a MultiIndex. To continue the code example, consider this modification:
idx = (("Outer", "Inner1"), ("Outer", "Inner2"), ("Outer", "a3235235235"))
df.columns = pd.MultiIndex.from_tuples(idx)
Upvotes: 8
Views: 7628
Reputation: 59549
You can accomplish this through formatters
; it takes a bit of code to create the dictionary {'col_name': format_string}
. Find the max character length in each column or the length of the column header, whichever is greater, add some padding, and then pass a formatting string.
Use partial
from functools
as the formatters expect a one parameter function, yet we need to specify a different width for each column.
import pandas as pd
df = pd.DataFrame({"c1": ("a", "bb", "ccc", "dddd", 'eeeeee'),
"c2": (1, 22, 33, 44, 55),
"a3235235235": [1,2,3,4,5]})
from functools import partial
# Formatting string
def get_fmt_str(x, fill):
return '{message: >{fill}}'.format(message=x, fill=fill)
# Max character length per column
s = df.astype(str).agg(lambda x: x.str.len()).max()
pad = 6 # How many spaces between
fmts = {}
for idx, c_len in s.iteritems():
# Deal with MultIndex tuples or simple string labels.
if isinstance(idx, tuple):
lab_len = max([len(str(x)) for x in idx])
else:
lab_len = len(str(idx))
fill = max(lab_len, c_len) + pad - 1
fmts[idx] = partial(get_fmt_str, fill=fill)
print(df.to_string(formatters=fmts))
c1 c2 a3235235235
0 a 11 1
1 bb 22 2
2 ccc 33 3
3 dddd 44 4
4 eeeeee 55 5
# MultiIndex Output
Outer
Inner1 Inner2 a3235235235
0 a 11 1
1 bb 22 2
2 ccc 33 3
3 dddd 44 4
4 eeeeee 55 5
Upvotes: 5