Kalimantan
Kalimantan

Reputation: 771

Match Phrase before and after colon

I have the following string:

'FIELDS--> FIELD1: Random Sentence  \r\n FIELD2: \r\nSOURCEHINT--> FIELD3: 
 value.nested.value, FIELD4: 5.5.5.5, FIELD5: Longer Sentence, with more words-and punctation\r\n'

I want the following from the string above:

[FIELD1, Random Sentence]
[FIELD2, ]
[FIELD3, value.nested.value]
[FIELD4, 5.5.5.5]
[FIELD5, Longer Sentence, with more words-and punctation]

I still want the value if it is empty and I want the full sentences. The amount of fields may vary as well. This is similar to Match word before and after colon, but in this case I want the full sentence instead of just the word. Additionally the FIELD names can change. So they could KEY3, instead of FIELD1.

I tried:

re.findall(r'(\w+) *:(?:(.*)?), x)

It stops matching after the first match, so this just outputs FIELD1, and matches everything after it.

Upvotes: 1

Views: 111

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

It seems you may use

r'(\w+) *: *(.*?)(?=\s*(?:\w+:|$))'

See the regex demo

Details

  • (\w+) - Group 1: one or more word chars
  • *: * - a : enclosed with spaces
  • (.*?) - Group 2: any chars, 0 or more repetitions, as few as possible, up to the first occurrence of
  • (?=\s*(?:\w+:|$)) - 0+ whitespaces followed with either 1+ word chars followed with : or an end of the string position.

Upvotes: 1

Related Questions