Reputation: 1232

Programmatically convert pandas dataframe to markdown table

I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:

+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+

As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:

import sys, sqlite3

db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()

rows = []
for index, row in df.iterrows():
    items = (row['date'], 
             row['path'], 
             row['language'], 
             row['shortest_sentence'],
             row['longest_sentence'], 
             row['number_words'], 
             row['readability_consensus'])
    rows.append(items)

headings = ['Date', 
            'Path', 
            'Language',
            'Shortest Sentence', 
            'Longest Sentence since', 
            'Words',
            'Grade level']

fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
         ('^','^'), ('^','^')]

table(sys.stdout, rows, fields, headings, align)

However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128) error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:

| ID | path                    | language | date       | longest_sentence                               | shortest_sentence                                      | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0  | data/Eng/Sagitarius.txt | Eng      | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...  | 306          | 11th and 12th grade   |
| 31 | data/Nor/Høylandet.txt  | Nor      | 2015-07-22 | Høgskolen i Østfold er et eksempel...          | Som skuespiller har jeg både...                        | 253          | 15th and 16th grade   |

Upvotes: 54

Answers (13)

kpykc

Reputation: 663

Improving the answer further, for use in IPython Notebook:

def pandas_df_to_markdown_table(df):
    from IPython.display import Markdown, display
    fmt = ['---' for i in range(len(df.columns))]
    df_fmt = pd.DataFrame([fmt], columns=df.columns)
    df_formatted = pd.concat([df_fmt, df])
    display(Markdown(df_formatted.to_csv(sep="|", index=False)))

pandas_df_to_markdown_table(infodf)

Or use tabulate:

pip install tabulate

Examples of use are in the documentation.

Update

As of pandas 1.0 DataFrame to markdown is available. Please see answer from @timvink (docs)

Upvotes: 40

timvink

Reputation: 696

Pandas 1.0 was released 29 january 2020 and supports markdown conversion, so you can now do this directly!

Example taken from the docs:

df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())

|    |   A |   B |
|:---|----:|----:|
| a  |   1 |   1 |
| a  |   2 |   2 |
| b  |   3 |   3 |

Or without the index:

print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1

|   A |   B |
|----:|----:|
|   1 |   1 |
|   2 |   2 |
|   3 |   3 |

Upvotes: 50

sam

Reputation: 2311

Pandas have merged a PR to support df.to_markdown() method. You can find more details here It should be available soon.

Upvotes: 2

user4157482

Reputation: 849

I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame as well.

Here is how to use it:

from pandas import DataFrame
from tabulate import tabulate

df = DataFrame({
    "weekday": ["monday", "thursday", "wednesday"],
    "temperature": [20, 30, 25],
    "precipitation": [100, 200, 150],
}).set_index("weekday")

print(tabulate(df, tablefmt="pipe", headers="keys"))

Output:

| weekday   |   temperature |   precipitation |
|:----------|--------------:|----------------:|
| monday    |            20 |             100 |
| thursday  |            30 |             200 |
| wednesday |            25 |             150 |

Upvotes: 40

Peter Zagubisalo

Reputation: 393

Yet another solution. This time via thin wrapper around tabulate: tabulatehelper

import numpy as np
import pandas as pd
import tabulatehelper as th

df = pd.DataFrame(np.random.random(16).reshape(4, 4), columns=('a', 'b', 'c', 'd'))
print(th.md_table(df, formats={-1: 'c'}))

Output:

|        a |        b |        c |        d |
|---------:|---------:|---------:|:--------:|
| 0.413284 | 0.932373 | 0.277797 | 0.646333 |
| 0.552731 | 0.381826 | 0.141727 | 0.2483   |
| 0.779889 | 0.012458 | 0.308352 | 0.650859 |
| 0.301109 | 0.982111 | 0.994024 | 0.43551  |

Upvotes: 2

Anake

Reputation: 7659

For those looking for how to do this using tabulate, I thought I'd put this here to save you some time:

print(tabulate(df, tablefmt="pipe", headers="keys", showindex=False))

Upvotes: 2

Ilya Prokin

Reputation: 722

Using external tool pandoc and pipe:

def to_markdown(df):
    from subprocess import Popen, PIPE
    s = df.to_latex()
    p = Popen('pandoc -f latex -t markdown',
              stdin=PIPE, stdout=PIPE, shell=True)
    stdoutdata, _ = p.communicate(input=s.encode("utf-8"))
    return stdoutdata.decode("utf-8")

Upvotes: 1

Gustavo Bezerra

Reputation: 11043

Here's an example function using pytablewriter and some regular expressions to make the markdown table more similar to how a dataframe looks on Jupyter (with the row headers bold).

import io
import re
import pandas as pd
import pytablewriter

def df_to_markdown(df):
    """
    Converts Pandas DataFrame to markdown table,
    making the index bold (as in Jupyter) unless it's a
    pd.RangeIndex, in which case the index is completely dropped.
    Returns a string containing markdown table.
    """
    isRangeIndex = isinstance(df.index, pd.RangeIndex)
    if not isRangeIndex:
        df = df.reset_index()
    writer = pytablewriter.MarkdownTableWriter()
    writer.stream = io.StringIO()
    writer.header_list = df.columns
    writer.value_matrix = df.values
    writer.write_table()
    writer.stream.seek(0)
    table = writer.stream.readlines()

    if isRangeIndex:
        return ''.join(table)
    else:
        # Make the indexes bold
        new_table = table[:2]
        for line in table[2:]:
            new_table.append(re.sub('^(.*?)\|', r'**\1**|', line))    

        return ''.join(new_table)

Upvotes: 1

dubbbdan

Reputation: 2740

I have tried several of the above solutions in this post and found this worked most consistently.

To convert a pandas data frame to a markdown table I suggest using pytablewriter. Using the data provided in this post:

import pandas as pd
import pytablewriter
from StringIO import StringIO

c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus 
0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade
31 , data/Nor/Høylandet.txt  , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade
""")
df = pd.read_csv(c,sep=',',index_col=['ID'])

writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "example_table"
writer.header_list = list(df.columns.values)
writer.value_matrix = df.values.tolist()
writer.write_table()

This results in:

# example_table
ID |           path           |language|    date    |                longest_sentence                |                   shortest_sentence                  | number_words | readability_consensus 
--:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|-----------------------
  0| data/Eng/Sagitarius.txt  | Eng    | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...|           306| 11th and 12th grade   
 31| data/Nor/Høylandet.txt  | Nor    | 2015-07-22 | Høgskolen i Østfold er et eksempel...        | Som skuespiller har jeg både...                      |           253| 15th and 16th grade

Here is a markdown rendered screenshot.

Upvotes: 6

Daniel Himmelstein

Reputation: 1919

Export a DataFrame to markdown

I created the following function for exporting a pandas.DataFrame to markdown in Python:

def df_to_markdown(df, float_format='%.2g'):
    """
    Export a pandas.DataFrame to markdown-formatted text.
    DataFrame should not contain any `|` characters.
    """
    from os import linesep
    return linesep.join([
        '|'.join(df.columns),
        '|'.join(4 * '-' for i in df.columns),
        df.to_csv(sep='|', index=False, header=False, float_format=float_format)
    ]).replace('|', ' | ')

This function may not automatically fix the encoding issues of the OP, but that is a different issue than converting from pandas to markdown.

Upvotes: 7

Rohit

Reputation: 6020

Try this out. I got it to work.

See the screenshot of my markdown file converted to HTML at the end of this answer.

import pandas as pd

# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)

# Get column names
cols = df.columns

# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)

#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])

#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)

Upvotes: 11

Alastair McCormack

Reputation: 27724

sqlite3 returns Unicodes by default for TEXT fields. Everything was set up to work before you introduced the table() function from an external source (that you did not provide in your question).

The table() function has str() calls which do not provide an encoding, so ASCII is used to protect you.

You need to re-write table() not to do this, especially as you've got Unicode objects. You may have some success by simply replacing str() with unicode()

Upvotes: 0

OleVik

Reputation: 1232

Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:

# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')

# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd

# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']

# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)

An important precursor to this is that the shortest_sentence and longest_sentence columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '') to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1 for Norwegian), but rather that UTF-8 is used instead of the default ASCII.

I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):

| Path                    | Language | Date       | Shortest Sentence                                                                            | Longest Sentence                                                                                                                                                                                                                                         | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng      | 2015-09-17 | I am able to relocate to London on short notice.                                             | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant.       | 306   | 11th and 12th grade   |
| data/Nor/NoeNorrønt.txt | Nor      | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe.                                     | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205   | 18th and 19th grade   |
| data/Nor/Ørret.txt.txt  | Nor      | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160   | 18th and 19th grade   |

Thus, a Markdown table without problems with encoding.

Upvotes: 1

Programmatically convert pandas dataframe to markdown table

Answers (13)

Export a DataFrame to markdown

Related Questions