titipata
titipata

Reputation: 5389

Split with multiple delimiters while keeping delimiters as dictionary keys

I would like to split the text using given multiple delimiters. However, I would like to still keep the delimiters in front of the text as a keys of dictionary instead of returning plain list.

I tried to go sequentially with my list of delimiters but it produces list of lists so I try to use regular expression (re) instead. But fore re, I cannot keep track of my delimiters after splitting. I'm wondering if there is a way that I can split my string using delimiters while keeping them as a key.

Here is my current solution using re which give a list of output.

import re

abstract = """
BACKGROUND\nN-Methyl-D-aspartate (NMDA) receptors are glutamate-activated ion channels that are assembled from NR1 and NR2 subunits. 
These receptors are highly enriched in brain neurons and are considered to be an important target for the acute and chronic effects of ethanol. 
NR2 subunits (A-D) arise from separate genes and are expressed in a developmental and brain region-specific manner. 
The NR1 subunit has 8 isoforms that are generated by alternative splicing of a single gene. 
The heteromeric subunit makeup of the NMDA receptor determines the pharmacological and biophysical properties of the receptor and provides for functional receptor heterogeneity. 
Although results from previous studies suggest that NR2 subunits affect the ethanol sensitivity of NMDA receptors, the role of the NR1 subunit and its multiple splice variants is less well known.
\n\n\nMETHODS\nIn this study, all 8 NR1 splice variants were individually coexpressed with each NR2 subunit in human embryonic kidney 293 (HEK293) cells and tested for inhibition by ethanol using patch-clamp electrophysiology.
\n\n\nRESULTS\nAll 32 subunit combinations tested gave reproducible glutamate-activated currents and all receptors were inhibited to some degree by 100 mM ethanol. 
The sensitivity of individual receptors to ethanol was affected by the specific NR1 splice variant expressed with receptors containing the NR1-3 and NR1-4 subunits among the least inhibited by ethanol.
\n\n\nCONCLUSIONS\nThese results suggest that regional, developmental, or compensatory changes in the expression of NR1 splice variants may significantly affect ethanol inhibition of NMDA receptors.
"""

delimiters = ['BACKGROUND\n', 'CONCLUSIONS\n', 'OBJECTIVES\n',
              'METHODS\n', 'OBJECTIVE\n', 'RESULTS\n']

sections = re.split('|'.join(delimiters), abstract)

Output

['',
 'N-Methyl-D-as ..., 
 'In this study, ...', 
 'All 32 subunit ...', 
 ...]

Desire Output

{'BACKGROUND\n': 'N-Methyl-D-as ...',
 'METHODS\n': 'In this study, ...', 
 ...}

Upvotes: 0

Views: 542

Answers (1)

Philip Tzou
Philip Tzou

Reputation: 6438

import re

abstract = """
BACKGROUND\nN-Methyl-D-aspartate (NMDA) receptors are glutamate-activated ion channels that are assembled from NR1 and NR2 subunits. 
These receptors are highly enriched in brain neurons and are considered to be an important target for the acute and chronic effects of ethanol. 
NR2 subunits (A-D) arise from separate genes and are expressed in a developmental and brain region-specific manner. 
The NR1 subunit has 8 isoforms that are generated by alternative splicing of a single gene. 
The heteromeric subunit makeup of the NMDA receptor determines the pharmacological and biophysical properties of the receptor and provides for functional receptor heterogeneity. 
Although results from previous studies suggest that NR2 subunits affect the ethanol sensitivity of NMDA receptors, the role of the NR1 subunit and its multiple splice variants is less well known.
\n\n\nMETHODS\nIn this study, all 8 NR1 splice variants were individually coexpressed with each NR2 subunit in human embryonic kidney 293 (HEK293) cells and tested for inhibition by ethanol using patch-clamp electrophysiology.
\n\n\nRESULTS\nAll 32 subunit combinations tested gave reproducible glutamate-activated currents and all receptors were inhibited to some degree by 100 mM ethanol. 
The sensitivity of individual receptors to ethanol was affected by the specific NR1 splice variant expressed with receptors containing the NR1-3 and NR1-4 subunits among the least inhibited by ethanol.
\n\n\nCONCLUSIONS\nThese results suggest that regional, developmental, or compensatory changes in the expression of NR1 splice variants may significantly affect ethanol inhibition of NMDA receptors.
"""

delimiters = ['BACKGROUND\n', 'CONCLUSIONS\n', 'OBJECTIVES\n',
              'METHODS\n', 'OBJECTIVE\n', 'RESULTS\n']

values = re.split('|'.join(delimiters), abstract)
values.pop(0)  # remove the initial empty string
keys = re.findall('|'.join(delimiters), abstract)
output = dict(zip(keys, values))

print(output)
# {'BACKGROUND\n': 'N-Methyl-D-aspartate (NMDA) receptors are glutamate-activated ion channels that are assembled from NR1 and NR2 subunits. \nThese receptors are highly enriched in brain neurons and are considered to be an important target for the acute and chronic effects of ethanol. \nNR2 subunits (A-D) arise from separate genes and are expressed in a developmental and brain region-specific manner. \nThe NR1 subunit has 8 isoforms that are generated by alternative splicing of a single gene. \nThe heteromeric subunit makeup of the NMDA receptor determines the pharmacological and biophysical properties of the receptor and provides for functional receptor heterogeneity. \nAlthough results from previous studies suggest that NR2 subunits affect the ethanol sensitivity of NMDA receptors, the role of the NR1 subunit and its multiple splice variants is less well known.\n\n\n\n', 'METHODS\n': 'In this study, all 8 NR1 splice variants were individually coexpressed with each NR2 subunit in human embryonic kidney 293 (HEK293) cells and tested for inhibition by ethanol using patch-clamp electrophysiology.\n\n\n\n', 'RESULTS\n': 'All 32 subunit combinations tested gave reproducible glutamate-activated currents and all receptors were inhibited to some degree by 100 mM ethanol. \nThe sensitivity of individual receptors to ethanol was affected by the specific NR1 splice variant expressed with receptors containing the NR1-3 and NR1-4 subunits among the least inhibited by ethanol.\n\n\n\n', 'CONCLUSIONS\n': 'These results suggest that regional, developmental, or compensatory changes in the expression of NR1 splice variants may significantly affect ethanol inhibition of NMDA receptors.\n'}

Upvotes: 3

Related Questions