Reputation: 56
I have a csv file which has the configuration information to create the yaml file (final desired result). Firstly, I am trying to convert each row of the csv file to a Dictionary and then I can easily convert Dictionary to yaml file using yaml.dump(Created_Dictionary)
Sample Input file (test.csv):
fieldname|type|allowed|coerce
field_A|String|10,20,30|to_str
field_B|Integer||to_int
My source code using pandas library:
df = pd.read_csv("test.csv", "|")
df_to_dict = df.to_dict(orient='records')
print(df_to_dict) # print the dictionary
test_yaml = yaml.dump(df_to_dict)
print(test_yaml) # print the yaml file
Output I am getting for dictionary(df_to_dict):
[{'fieldname': 'field_A', 'type': 'String', 'allowed': '10,20,30'}, {'fieldname': 'field_B', 'type': 'Integer', 'allowed': nan}]
Output I am getting for yaml (test_yaml):
- allowed: 10,20,30
fieldname: field_A
type: String
- allowed: .nan
fieldname: field_B
type: Integer
Desired dictionary output (df_to_dict) is:
[
{'field_A':
{'type': 'String', 'allowed': '10,20,30', 'coerce': to_str}
},
{'field_B':
{'type': 'String', 'allowed': '', 'coerce': to_int}
}
]
Desired yaml output (test_yaml) is:
field_A:
type: String
allowed:
- '10'
- '20'
- '30'
coerce: to_str
field_B:
type: Integer
allowed:
coerce: to_int
I see that the variable, df_to_dict, is a list of dictionaries. Do I have to loop through each list item and then build the dictionary for each row ? I am not understanding the correct approach. Any help is appreciated.
Upvotes: 1
Views: 1070
Reputation: 23146
Try:
df = pd.read_csv("test.csv", "|")
my_dict = df.set_index("fieldname").to_dict("index")
#convert allowed items to list
df["allowed"] = df["allowed"].str.split(",")
test_yaml = yaml.dump(df.set_index("fieldname").to_dict("index"), sort_keys=False)
>>> my_dict
{'field_A': {'type': 'String', 'allowed': '10,20,30', 'coerce': 'to_str'},
'field_B': {'type': 'Integer', 'allowed': nan, 'coerce': 'to_int'}}
>>> print(test_yaml)
field_A:
type: String
allowed:
- '10'
- '20'
- '30'
coerce: to_str
field_B:
type: Integer
allowed: .nan
coerce: to_int
Upvotes: 2
Reputation: 11188
If you don't need Pandas, and I don't see any need for it in your description or your example, use Python's built-in csv library, and its DictReader class.
import csv
import pprint
yaml_d = {}
with open('sample.csv', newline='') as f:
reader = csv.DictReader(f, delimiter='|')
for row in reader:
fname = row['fieldname']
allowed = row['allowed'].split(',')
yaml_d[fname] = row # "index" row by fieldname
yaml_d[fname]['allowed'] = allowed
del yaml_d[fname]['fieldname'] # remove now-extraneous fieldname from row
pprint.pprint(yaml_d)
gets me:
{'field_A': {'allowed': ['10', '20', '30'],
'coerce': 'to_str',
'type': 'String'},
'field_B': {'allowed': [''], 'coerce': 'to_int', 'type': 'Integer'}}
Upvotes: 1
Reputation: 581
You want to play around with the index of your pandas DataFrame.
>>> df = pd.read_csv("test.csv", sep="|", index_col=0)
>>> df
type allowed
fieldname
field_A String 10,20,30
field_B Integer NaN
>>> df.to_dict(‘index’) # returns dict like {index -> {column -> value}}
{'field_A': {'type': 'String', 'allowed': '10,20,30'}, 'field_B': {'type': 'Integer', 'allowed': nan}}
>>> print(yaml.dump(df.to_dict(‘index’)))
field_A:
allowed: 10,20,30
type: String
field_B:
allowed: .nan
type: Integer
The .nan
you have to deal with a custom dump or filter.
See
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Upvotes: 1