Reputation: 209
I want to provide a variable names and their datatypes in a json config file and system will be using them for further processing like
myconfig.json
{
header:{
field1:int,
field2:long,
field3:float,
field4:string
}
}
When a data is read from input files/sources then another program will read this configuration and will map input records with this header and datatypes and if required transformations will also be done.
example,
If I have input file like,
3,10,3.5,abc
3,010,3,bcd
I would like to parse this file with above header schema and validate and transform datatypes for these fields. this output will be json and will be fed to another system. 010
in second row should be transformed to 10
( as it fails with json.loads) and 3
should be converted to float 3.0
etc.
I have tried using ini and conf files but failed to achieve above. Can anyone help me to achieve above behavior ?
Upvotes: 0
Views: 1019
Reputation: 18924
Here is a solution using pandas, json and (numpy):
import pandas as pd
import json
import numpy as np
# Create a file (csv) for test purposes
data = '''\
3,10,3.5,abc
3,010,3,bcd'''
file = io.StringIO(data)
# Create a file (json) for test purposes
json_data = '''\
{
"header":[
["field1","int16"],
["field2","float32"],
["field3","float64"],
["field4","str"]]
}'''
# Load json to dictionary
json_d = json.loads(json_data)
# Fetch field names and dtypes
names = [i[0] for i in json_d['header']]
dtype = dict(json_d['header'])
# Now use pandas to read the whole thing to a dataframe
df = pd.read_csv(file,header=None,names=names,dtype=dtype)
# Output as dict (this can be passed to a json file with json.dump())
df.to_dict('r')
Result:
[{'field1': 3, 'field2': 10.0, 'field3': 3.5, 'field4': 'abc'},
{'field1': 3, 'field2': 10.0, 'field3': 3.0, 'field4': 'bcd'}]
Upvotes: 1
Reputation: 910
You can use below code:
import re
import json
f=open('f.csv','r')
alllines=f.readlines()
a={}
for line in alllines:
b={}
temp=re.sub(' +',' ',line) #delete extra space in one line
temp=temp.strip().split(',') #split using space
b.update({'field1':int(temp[0])})
b.update({'field2':int(temp[1])})
b.update({'field3':float(temp[2])})
b.update({'field4':str(temp[3])})
a.update({'header'+str(alllines.index(line)):b})
outfile=open('x.json','w')
json.dump(a,outfile)
Upvotes: 1