Reputation: 1349
I am trying to figure out, what is the easiest way to convert some markdown table text into json using only python. For example, consider this as input string:
| Some Title | Some Description | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game | 5 |
| Bloodborne | This one is even better | 2 |
| Sekiro | This one is also pretty good | 110101 |
The output should be like this:
[
{"Some Title":"Dark Souls","Some Description":"This is a fun game","Some Number":5},
{"Some Title":"Bloodborne","Some Description":"This one is even better","Some Number":2},
{"Some Title":"Sekiro","Some Description":"This one is also pretty good","Some Number":110101}
]
Note: Ideally, the output should be RFC 8259 compliant, aka use double quotes " instead of single quotes ' around they key value pairs.
I've seen some JS libraries that do that, but nothing for python only.
Upvotes: 4
Views: 8175
Reputation: 7293
You could let csv
do the main work and do something like the following:
import csv
import json
markdown_table = """| Some Title | Some Description | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game | 5 |
| Bloodborne | This one is even better | 2 |
| Sekiro | This one is also pretty good | 110101 |"""
lines = markdown_table.split("\n")
dict_reader = csv.DictReader(lines, delimiter="|")
data = []
# skip first row, i.e. the row between the header and data
for row in list(dict_reader)[1:]:
# strip spaces and ignore first empty column
r = {k.strip(): v.strip() for k, v in row.items() if k != ""}
data.append(r)
print(json.dumps(data, indent=4))
This is the output
[
{
"Some Title": "Dark Souls",
"Some Description": "This is a fun game",
"Some Number": "5"
},
{
"Some Title": "Bloodborne",
"Some Description": "This one is even better",
"Some Number": "2"
},
{
"Some Title": "Sekiro",
"Some Description": "This one is also pretty good",
"Some Number": "110101"
}
]
Upvotes: 5
Reputation: 3856
You can treat it as a multi-line string and parse it line by line while splitting at \n
and |
Simple code that does that:
import json
my_str='''| Some Title | Some Description | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game | 5 |
| Bloodborne | This one is even better | 2 |
| Sekiro | This one is also pretty good | 110101 |'''
def mrkd2json(inp):
lines = inp.split('\n')
ret=[]
keys=[]
for i,l in enumerate(lines):
if i==0:
keys=[_i.strip() for _i in l.split('|')]
elif i==1: continue
else:
ret.append({keys[_i]:v.strip() for _i,v in enumerate(l.split('|')) if _i>0 and _i<len(keys)-1})
return json.dumps(ret, indent = 4)
print(mrkd2json(my_str))
[
{
"Some Title": "Dark Souls",
"Some Description": "This is a fun game",
"Some Number": "5"
},
{
"Some Title": "Bloodborne",
"Some Description": "This one is even better",
"Some Number": "2"
},
{
"Some Title": "Sekiro",
"Some Description": "This one is also pretty good",
"Some Number": "110101"
}
]
PS: Don't know about any library that does that, will update if I find anything!
Upvotes: 3
Reputation: 1738
My approach was very similar to @Kuldeep Singh Sidhu's:
md_table = """
| Some Title | Some Description | Some Number |
|------------|------------------------------|-------------|
| Dark Souls | This is a fun game | 5 |
| Bloodborne | This one is even better | 2 |
| Sekiro | This one is also pretty good | 110101 |
"""
result = []
for n, line in enumerate(md_table[1:-1].split('\n')):
data = {}
if n == 0:
header = [t.strip() for t in line.split('|')[1:-1]]
if n > 1:
values = [t.strip() for t in line.split('|')[1:-1]]
for col, value in zip(header, values):
data[col] = value
result.append(data)
Result is:
[{'Some Title': 'Dark Souls',
'Some Description': 'This is a fun game',
'Some Number': '5'},
{'Some Title': 'Bloodborne',
'Some Description': 'This one is even better',
'Some Number': '2'},
{'Some Title': 'Sekiro',
'Some Description': 'This one is also pretty good',
'Some Number': '110101'}]
Upvotes: 4