kyle zim
kyle zim

Reputation: 3

split by regex and add matches to dictionary

first time posting here.

I'd like to 1) parse the following text:"keyword: some keywords concept :some concepts"

and 2) store into the dictionary: ['keyword']=>'some keywords', ['concept']=>'some concepts'.

There may be 0 or 1 'space' before each 'colon'. The following is what I've tried so far.

sample_text = "keyword: some keywords concept :some concepts"

p_res = re.compile("(\S+\s?):").split(sample_text) # Task 1 

d_inc = dict([(k, v) for k,v in zip (p_res[::2], p_res[1::2])]) # Task 2

However, the list result p_res is wrong , with empty entry at the index 0, which consequently produce wrong dict. Is there something wrong with my regex?

Upvotes: 0

Views: 2893

Answers (2)

Yaakov Belch
Yaakov Belch

Reputation: 4863

Simply replace Task1 by this line:

p_res = re.compile("(\S+\s?):").split(sample_text)[1:] # Task 1 

This will always ignore the (normally empty) element that is returned by re.split.

Background: Why does re.split return the empty first result?

What should the program do with this input:

sample_text = "Hello! keyword: some keywords concept :some concepts"

The text Hello! at the beginning of the input doesn't fit into the definition of your problem (which assumes that the input starts with a key).

Do you want to ignore it? Do you want to raise an exception if it appears? Do you want to want to add it to your dictionary with a special key?

re.split doesn't want to decide this for you: It returns whatever information appears and you make your decision. In our solution, we simply ignore whatever appears before the first key.

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174874

Use re.findall to capture list of groups in a match. And then apply dict to convert list of tuples to dict.

>>> import re
>>> s = 'keyword: some keywords concept :some concepts'
>>> dict(re.findall(r'(\S+)\s*:\s*(.*?)\s*(?=\S+\s*:|$)', s))
{'concept': 'some concepts', 'keyword': 'some keywords'}
>>> 

Above regex would capture key and it's corresponding value in two separate groups.

I assume that the input string contain only key value pair and the key won't contain any space character.

DEMO

Upvotes: 3

Related Questions