gozzilli
gozzilli

Reputation: 8347

Panda's info() to HTML

Pandas offers some summary statistics with the describe() function called on a DataFrame. The output of the function is another DataFrame, so it's easily exported to HTML with a call to to_html().

It also offers information about the DataFrame with the info() function, but that's printed out, returning None. Is there a way to get the same information as a DataFrame or any other way that can be exported to HTML?

Here is a sample info() for reference:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 7 columns):
0    5 non-null float64
1    5 non-null float64
2    5 non-null float64
3    5 non-null float64
4    5 non-null float64
5    5 non-null float64
6    5 non-null float64
dtypes: float64(7)
memory usage: 360.0 bytes

Upvotes: 7

Views: 3023

Answers (4)

gozzilli
gozzilli

Reputation: 8347

With input from all these great answers, I ended up doing the following:

  • Strip the first three and last two lines away, because they contain memory information and other things that are not in tabular format (and a fixed number of lines)
  • Convert the column information (datatype in the snippet below) into a pandas' DataFrame using StringIO
  • Renamed the columns "count", "null" and "dtype"
  • Returned the html of the column info and the plain text of the remaining lined (first 3 and last 2)

So there result is this:

def process_content_info(content: pd.DataFrame):
    content_info = StringIO()
    content.info(buf=content_info)
    str_ = content_info.getvalue()

    lines = str_.split("\n")
    table = StringIO("\n".join(lines[3:-3]))
    datatypes = pd.read_table(table, delim_whitespace=True, 
                   names=["column", "count", "null", "dtype"])
    datatypes.set_index("column", inplace=True)

    info = "\n".join(lines[0:2] + lines[-2:-1])

    return info, datatypes

Perhaps the second StringIO can be simplified, but anyway this achieves what I needed.

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19957

import StringIO
output = StringIO.StringIO()
#Write df.info to a string buffer
df.info(buf=output)
#put the info back to a dataframe so you can use df.to_html()
df_info =  pd.DataFrame(columns=['DF INFO'], data=output.getvalue().split('\n'))
df_info.to_html()

Upvotes: 1

FLab
FLab

Reputation: 7496

A solution can be to save the output of info() to a writable buffer (using the buf argument) and then converting to html.

Below an example using a txt file as buffer, but this could be easily done in memory using StringIO.

import pandas as pd
import numpy as np

frame = pd.DataFrame(np.random.randn(100, 3), columns =['A', 'B', 'C'])

_ = frame.info(buf = open('test_pandas.txt', 'w'))   #save to txt

# Example to convert to html
contents = open("test_pandas.txt","r")
with open("test_pandas.html", "w") as e:
    for lines in contents.readlines():
        e.write("<pre>" + lines + "</pre> <br>\n")

Here's how the txt looks like:

enter image description here

The variation using StringIO can be found in @jezrael answer, so probably no point updating this answer.

Upvotes: 1

jezrael
jezrael

Reputation: 863301

I try rewrite another solution with StringIO, also is necessary use getvalue() with split:

from pandas.compat import StringIO

frame = pd.DataFrame(np.random.randn(100, 3), columns =['A', 'B', 'C'])

a = StringIO()
frame.info(buf = a)  

# Example to convert to html
contents = a.getvalue().split('\n')
with open("test_pandas.html", "w") as e:
    for lines in contents:
        e.write("<pre>" + lines + "</pre> <br>\n")

Upvotes: 1

Related Questions