Reputation: 1232
I have a Pandas Dataframe generated from a database, which has data with mixed encodings. For example:
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
| 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |
+----+-------------------------+----------+------------+------------------------------------------------+--------------------------------------------------------+--------------+-----------------------+
As seen there is a mix of English and Norwegian (encoded as ISO-8859-1 in the database I think). I need to get the contents of this Dataframe output as a Markdown table, but without getting problems with encoding. I followed this answer (from the question Generate Markdown tables?) and got the following:
import sys, sqlite3
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, longest_sentence, shortest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
rows = []
for index, row in df.iterrows():
items = (row['date'],
row['path'],
row['language'],
row['shortest_sentence'],
row['longest_sentence'],
row['number_words'],
row['readability_consensus'])
rows.append(items)
headings = ['Date',
'Path',
'Language',
'Shortest Sentence',
'Longest Sentence since',
'Words',
'Grade level']
fields = [0, 1, 2, 3, 4, 5, 6]
align = [('^', '<'), ('^', '^'), ('^', '<'), ('^', '^'), ('^', '>'),
('^','^'), ('^','^')]
table(sys.stdout, rows, fields, headings, align)
However, this yields an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe5' in position 72: ordinal not in range(128)
error. How can I output the Dataframe as a Markdown table? That is, for the purpose of storing this code in a file for use in writing a Markdown document. I need the output to look like this:
| ID | path | language | date | longest_sentence | shortest_sentence | number_words | readability_consensus |
|----|-------------------------|----------|------------|------------------------------------------------|--------------------------------------------------------|--------------|-----------------------|
| 0 | data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not... | 306 | 11th and 12th grade |
| 31 | data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253 | 15th and 16th grade |
Upvotes: 54
Views: 73038
Reputation: 663
Improving the answer further, for use in IPython Notebook:
def pandas_df_to_markdown_table(df):
from IPython.display import Markdown, display
fmt = ['---' for i in range(len(df.columns))]
df_fmt = pd.DataFrame([fmt], columns=df.columns)
df_formatted = pd.concat([df_fmt, df])
display(Markdown(df_formatted.to_csv(sep="|", index=False)))
pandas_df_to_markdown_table(infodf)
Or use tabulate:
pip install tabulate
Examples of use are in the documentation.
Update
As of pandas 1.0 DataFrame to markdown is available. Please see answer from @timvink (docs)
Upvotes: 40
Reputation: 696
Pandas 1.0 was released 29 january 2020 and supports markdown conversion, so you can now do this directly!
Example taken from the docs:
df = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=['a', 'a', 'b'])
print(df.to_markdown())
| | A | B |
|:---|----:|----:|
| a | 1 | 1 |
| a | 2 | 2 |
| b | 3 | 3 |
Or without the index:
print(df.to_markdown(index=False)) # use 'showindex' for pandas < 1.1
| A | B |
|----:|----:|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
Upvotes: 50
Reputation: 2311
Pandas have merged a PR to support df.to_markdown() method. You can find more details here It should be available soon.
Upvotes: 2
Reputation: 849
I recommend python-tabulate library for generating ascii-tables. The library supports pandas.DataFrame
as well.
Here is how to use it:
from pandas import DataFrame
from tabulate import tabulate
df = DataFrame({
"weekday": ["monday", "thursday", "wednesday"],
"temperature": [20, 30, 25],
"precipitation": [100, 200, 150],
}).set_index("weekday")
print(tabulate(df, tablefmt="pipe", headers="keys"))
Output:
| weekday | temperature | precipitation |
|:----------|--------------:|----------------:|
| monday | 20 | 100 |
| thursday | 30 | 200 |
| wednesday | 25 | 150 |
Upvotes: 40
Reputation: 393
Yet another solution. This time via thin wrapper around tabulate: tabulatehelper
import numpy as np
import pandas as pd
import tabulatehelper as th
df = pd.DataFrame(np.random.random(16).reshape(4, 4), columns=('a', 'b', 'c', 'd'))
print(th.md_table(df, formats={-1: 'c'}))
Output:
| a | b | c | d |
|---------:|---------:|---------:|:--------:|
| 0.413284 | 0.932373 | 0.277797 | 0.646333 |
| 0.552731 | 0.381826 | 0.141727 | 0.2483 |
| 0.779889 | 0.012458 | 0.308352 | 0.650859 |
| 0.301109 | 0.982111 | 0.994024 | 0.43551 |
Upvotes: 2
Reputation: 7659
For those looking for how to do this using tabulate
, I thought I'd put this here to save you some time:
print(tabulate(df, tablefmt="pipe", headers="keys", showindex=False))
Upvotes: 2
Reputation: 722
Using external tool pandoc
and pipe:
def to_markdown(df):
from subprocess import Popen, PIPE
s = df.to_latex()
p = Popen('pandoc -f latex -t markdown',
stdin=PIPE, stdout=PIPE, shell=True)
stdoutdata, _ = p.communicate(input=s.encode("utf-8"))
return stdoutdata.decode("utf-8")
Upvotes: 1
Reputation: 11043
Here's an example function using pytablewriter
and some regular expressions to make the markdown table more similar to how a dataframe looks on Jupyter (with the row headers bold).
import io
import re
import pandas as pd
import pytablewriter
def df_to_markdown(df):
"""
Converts Pandas DataFrame to markdown table,
making the index bold (as in Jupyter) unless it's a
pd.RangeIndex, in which case the index is completely dropped.
Returns a string containing markdown table.
"""
isRangeIndex = isinstance(df.index, pd.RangeIndex)
if not isRangeIndex:
df = df.reset_index()
writer = pytablewriter.MarkdownTableWriter()
writer.stream = io.StringIO()
writer.header_list = df.columns
writer.value_matrix = df.values
writer.write_table()
writer.stream.seek(0)
table = writer.stream.readlines()
if isRangeIndex:
return ''.join(table)
else:
# Make the indexes bold
new_table = table[:2]
for line in table[2:]:
new_table.append(re.sub('^(.*?)\|', r'**\1**|', line))
return ''.join(new_table)
Upvotes: 1
Reputation: 2740
I have tried several of the above solutions in this post and found this worked most consistently.
To convert a pandas data frame to a markdown table I suggest using pytablewriter. Using the data provided in this post:
import pandas as pd
import pytablewriter
from StringIO import StringIO
c = StringIO("""ID, path,language, date,longest_sentence, shortest_sentence, number_words , readability_consensus
0, data/Eng/Sagitarius.txt , Eng, 2015-09-17 , With administrative experience in the prepa... , I am able to relocate internationally on short not..., 306, 11th and 12th grade
31 , data/Nor/Høylandet.txt , Nor, 2015-07-22 , Høgskolen i Østfold er et eksempel..., Som skuespiller har jeg både..., 253, 15th and 16th grade
""")
df = pd.read_csv(c,sep=',',index_col=['ID'])
writer = pytablewriter.MarkdownTableWriter()
writer.table_name = "example_table"
writer.header_list = list(df.columns.values)
writer.value_matrix = df.values.tolist()
writer.write_table()
This results in:
# example_table
ID | path |language| date | longest_sentence | shortest_sentence | number_words | readability_consensus
--:|--------------------------|--------|------------|------------------------------------------------|------------------------------------------------------|-------------:|-----------------------
0| data/Eng/Sagitarius.txt | Eng | 2015-09-17 | With administrative experience in the prepa... | I am able to relocate internationally on short not...| 306| 11th and 12th grade
31| data/Nor/Høylandet.txt | Nor | 2015-07-22 | Høgskolen i Østfold er et eksempel... | Som skuespiller har jeg både... | 253| 15th and 16th grade
Here is a markdown rendered screenshot.
Upvotes: 6
Reputation: 1919
I created the following function for exporting a pandas.DataFrame to markdown in Python:
def df_to_markdown(df, float_format='%.2g'):
"""
Export a pandas.DataFrame to markdown-formatted text.
DataFrame should not contain any `|` characters.
"""
from os import linesep
return linesep.join([
'|'.join(df.columns),
'|'.join(4 * '-' for i in df.columns),
df.to_csv(sep='|', index=False, header=False, float_format=float_format)
]).replace('|', ' | ')
This function may not automatically fix the encoding issues of the OP, but that is a different issue than converting from pandas to markdown.
Upvotes: 7
Reputation: 6020
Try this out. I got it to work.
See the screenshot of my markdown file converted to HTML at the end of this answer.
import pandas as pd
# You don't need these two lines
# as you already have your DataFrame in memory
df = pd.read_csv("nor.txt", sep="|")
df.drop(df.columns[-1], axis=1)
# Get column names
cols = df.columns
# Create a new DataFrame with just the markdown
# strings
df2 = pd.DataFrame([['---',]*len(cols)], columns=cols)
#Create a new concatenated DataFrame
df3 = pd.concat([df2, df])
#Save as markdown
df3.to_csv("nor.md", sep="|", index=False)
Upvotes: 11
Reputation: 27724
sqlite3 returns Unicodes by default for TEXT fields. Everything was set up to work before you introduced the table()
function from an external source (that you did not provide in your question).
The table()
function has str()
calls which do not provide an encoding, so ASCII is used to protect you.
You need to re-write table()
not to do this, especially as you've got Unicode objects. You may have some success by simply replacing str()
with unicode()
Upvotes: 0
Reputation: 1232
Right, so I've taken a leaf from a question suggested by Rohit (Python - Encoding string - Swedish Letters), extended his answer, and came up with the following:
# Enforce UTF-8 encoding
import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('UTF-8')
# SQLite3 database
import sqlite3
# Pandas: Data structures and data analysis tools
import pandas as pd
# Read database, attach as Pandas dataframe
db = sqlite3.connect("Applications.db")
df = pd.read_sql_query("SELECT path, language, date, shortest_sentence, longest_sentence, number_words, readability_consensus FROM applications ORDER BY date(date) DESC", db)
db.close()
df.columns = ['Path', 'Language', 'Date', 'Shortest Sentence', 'Longest Sentence', 'Words', 'Readability Consensus']
# Parse Dataframe and apply Markdown, then save as 'table.md'
cols = df.columns
df2 = pd.DataFrame([['---','---','---','---','---','---','---']], columns=cols)
df3 = pd.concat([df2, df])
df3.to_csv("table.md", sep="|", index=False)
An important precursor to this is that the shortest_sentence
and longest_sentence
columns do not contain unnecessary line breaks, as removed by applying .replace('\n', ' ').replace('\r', '')
to them before submitting into the SQLite database. It appears that the solution is not to enforce the language-specific encoding (ISO-8859-1
for Norwegian), but rather that UTF-8
is used instead of the default ASCII
.
I ran this through my IPython notebook (Python 2.7.10) and got a table like the following (fixed spacing for appearance here):
| Path | Language | Date | Shortest Sentence | Longest Sentence | Words | Readability Consensus |
|-------------------------|----------|------------|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|-----------------------|
| data/Eng/Something1.txt | Eng | 2015-09-17 | I am able to relocate to London on short notice. | With my administrative experience in the preparation of the structure and content of seminars in various courses, and critiquing academic papers on various levels, I am confident that I can execute the work required as an editorial assistant. | 306 | 11th and 12th grade |
| data/Nor/NoeNorrønt.txt | Nor | 2015-09-17 | Jeg har grundig kjennskap til Microsoft Office og Adobe. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 205 | 18th and 19th grade |
| data/Nor/Ørret.txt.txt | Nor | 2015-09-17 | Jeg håper på positiv tilbakemelding, og møter naturligvis til intervju hvis det er ønskelig. | I løpet av studiene har jeg vært salgsmedarbeider for et større konsern, hvor jeg solgte forsikring til studentene og de faglige ansatte ved universitetet i Trønderlag, samt renholdsarbeider i et annet, hvor jeg i en periode var avdelingsansvarlig. | 160 | 18th and 19th grade |
Thus, a Markdown table without problems with encoding.
Upvotes: 1