Reputation: 65
Here is my Json File
{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}
The goal is to remove any and all special characters in the column names (or if easier any special character at all in the .json file), and return a .json file. My initial thoughts is to convert it to pandas, remove special characters in the column heading and convert it back to a .json file.
This is what I have tried so far. Both of them print a single line only.
import json
from pandas.io.json import json_normalize
data_file = r"C:\characters.json"
with open(data_file) as data_file:
data = json.load(data_file)
df = json_normalize(data)
--
data_file = r"C:\characters.json"
df = pd.read_json(data_file)
How can I extract the columns, remove special characters and put them back in a .json file ?
Upvotes: 2
Views: 5500
Reputation: 77902
A bit Q&D - you'll have to provide a complete implementation for fixkey
but this should fix your problem.
import json
def fixkey(key):
# toy implementation
#print("fixing {}".format(key))
return key.replace("&", "").replace("$", "")
def normalize(data):
#print("normalizing {}".format(data))
if isinstance(data, dict):
data = {fixkey(key): normalize(value) for key, value in data.items()}
elif isinstance(data, list):
data = [normalize(item) for item in data]
return data
jsdata = """
{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}
"""
data = json.loads(jsdata)
data = normalize(data)
result = json.dumps(data, indent=2)
print(result)
Upvotes: 3
Reputation: 13175
Frankly this is ugly but I haven't been able to find a more generic approach. This is very specific to your particular JSON (the problem really needs solving in the API).
import json
response = """{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}"""
def fix_json(resp):
output = {'highest_table': {'items': []}}
for item in resp['highest_table']['items']:
inner_dict = item['columns']
fixed_values = {'Name': inner_dict['Na$me'],
'Description(ms/2)': inner_dict['Description(ms/2)'],
'Age': inner_dict['A&ge'],
'Gender': inner_dict['Ge_nder']
}
new_inner = {'key': item['key'], 'columns': fixed_values}
output['highest_table']['items'].append(new_inner)
return output
response = json.loads(response)
fixed = fix_json(response)
Upvotes: 1