Venugopal Bukkala
Venugopal Bukkala

Reputation: 179

mapping a yaml file in python

This is a YAML file. It contains a list of mappings from ticker to feature category.

Following is the mapping of BANKNIFTY_O_C_0_10_W:

index: [ BANKNIFTY_O_C_0_09_W: books,BANKNIFTY_O_C_0_09_W: trends,BANKNIFTY_O_C_0_09_W: trades,BANKNIFTY_O_C_0_09_W: relations,BANKNIFTY_O_P_0_09_W: books,BANKNIFTY_O_P_0_09_W: trends,BANKNIFTY_O_P_0_09_W: trades,BANKNIFTY_O_P_0_09_W: negrelations,BANKNIFTY_O_C_0_10_W: books,BANKNIFTY_O_C_0_10_W: trends,BANKNIFTY_O_C_0_10_W: trades,BANKNIFTY_O_C_0_10_W: relations,BANKNIFTY_O_C_0_10_W: options_banknifty_weekly,BANKNIFTY_O_P_0_10_W: books,BANKNIFTY_O_P_0_10_W: trends,BANKNIFTY_O_P_0_10_W: trades,BANKNIFTY_O_P_0_10_W: negrelations,BANKNIFTY_F_0: books,BANKNIFTY_F_0: trends,BANKNIFTY_F_0: trades,BANKNIFTY_F_0: relations,NIFTY_F_0: books,NIFTY_F_0: trends,NIFTY_F_0: trades,NIFTY_F_0: relations ]

I need the following output:

index: 
- BANKNIFTY_O_C_0_09_W: [books, trends, trades, relations]
- BANKNIFTY_O_P_0_09_W: [books, trends, trades, negrelations]
- BANKNIFTY_O_C_0_10_W: [books, trends, trades, relations, options_banknifty_weekly]
- BANKNIFTY_O_P_0_09_W: [books, trends, trades, negrelations]
- BANKNIFTY_F_0: [books, trends, trades, relations]
- NIFTY_F_0: [books, trends, trades, relations]

Upvotes: 0

Views: 976

Answers (1)

Anthon
Anthon

Reputation: 76578

Your input is a single item mapping, with as value a list of single item mappings. Your output is a a list of single item mappings. That list is ordered in the same way the keys of the original mappings appear. This indicates that gathering that information should be done using a list or OrderedDict

The corresponding values of those mappings is a list of original values for the keys of those mappings, also in the order they appear, but which at least partly repeat in the original, not in the target. Since the order needs preserving, a set (which would automatically filter doubles), cannot be used. Instead a list could be used, which requires checking of an item already being in the list. However in the following I use another OrderedDict, abused as "OrderedSet" by not looking at the values.

The input is assumed to be in the file input.yaml:

import sys
import pathlib
from collections import OrderedDict
import ruamel.yaml

yaml_file = pathlib.Path('input.yaml')
yaml = ruamel.yaml.YAML()
yaml.default_flow_style = None 
data = yaml.load(yaml_file)
indexed = OrderedDict()
for elem in data['index']:
    for k in elem:  # just one each
        single_item_map = indexed.setdefault(k, OrderedDict())
        single_item_map[elem[k]] = None  # arbitrary value, unused
l = []
for elem in indexed:
    l.append({elem: [k for k in indexed[elem]]})
data['index'] = l
yaml.dump(data, sys.stdout)

which gives:

index:
- BANKNIFTY_O_C_0_09_W: [books, trends, trades, relations]
- BANKNIFTY_O_P_0_09_W: [books, trends, trades, negrelations]
- BANKNIFTY_O_C_0_10_W: [books, trends, trades, relations, options_banknifty_weekly]
- BANKNIFTY_O_P_0_10_W: [books, trends, trades, negrelations]
- BANKNIFTY_F_0: [books, trends, trades, relations]
- NIFTY_F_0: [books, trends, trades, relations]

The yaml.default_flow_style=None is necessary as by default an instance YAML() will use block style, whereas your output has flow style on the leaf-nodes. More fine tuned control is possible in ruamel.yaml by not making "normal" dicts and lists but subclassing the objects internally used for keeping round-trip information. In your case this is not necessary as you want one of the three modes controlled by .default_flow_style (False: all-block-style, True: all-flow-style, None: block-style-with-leafs-in-flow-style)

Upvotes: 1

Related Questions