Mary
Mary

Reputation: 23

Excel - Extract string between two strings

I have a vector of strings of the following kind:

{'id': '123', 'text': 'What should I buy today?', 'choices':[{'value': 'Bread', 'label': 'You need to buy bread', 'selected': True}, {'value': 'Milk', 'label': 'You need to buy milk', 'selected': False}, {'value': 'Chips', 'label': "You need to buy chips", 'selected': False}]}

I would like to create a column made of only the sentences which end with

'selected': True

For example, regarding the above string I would get

You need to buy bread

As of now, I could only write the following command:

=MID(LEFT(A1,FIND(", 'selected': True",G5)-1),FIND("label':",G5)+7,LEN(G5))

But, in case the action selected ('selected': True) is not the first one of the list, it won't work.


New added:

enter image description here

Upvotes: 2

Views: 1123

Answers (2)

bosco_yip
bosco_yip

Reputation: 3802

Here is another formula option and without LET function for all Excel versions.

The criteria ('selected': True) can be put in the 1st, 2nd or last position of the list as per following example.

In C2, formula copied down :

=TRIM(RIGHT(SUBSTITUTE(LEFT(SUBSTITUTE(A2,"""","'"),FIND(", 'selected': True",A2)-2),"'",REPT(" ",99)),99))

enter image description here

Upvotes: 2

mark fitzpatrick
mark fitzpatrick

Reputation: 3310

With Office 365, you can do:

=LET( line, D3,
       sentences, MID( line, FIND( "[", line ), 99999),
       sarr, FILTERXML( "<m><s>" & SUBSTITUTE( sentences, "{", "</s><s>" ) & "</s></m>", "//s" ),
       larr, INDEX( sarr, FILTER( SEQUENCE( ROWS( sarr ) ), ISNUMBER( FIND( "'selected': True", sarr ) ) ) ),
       s, FIND( "'label': '", larr)+10, f, FIND( "',", larr, s ),
       MID( larr, s, f-s )  )

Where D3 is your target cell with your original line.

I suspect there is an easier way with a better FILTERXML hack, but it does not come to me at the moment. If JvdV is on line, he would know. Also, P.b. would probably do this in a better way.

NB: you must be consistent with string definitions. You must use ' as opposed to ".

enter image description here

Another approach is to force the conversions of " to '. This will do that, but it may have unintended consequences:

=LET( line, D3,
       sentences, MID( SUBSTITUTE( line,"""","'"), FIND( "[", line ), 99999),
       sarr, FILTERXML( "<m><s>" & SUBSTITUTE( sentences, "{", "</s><s>" ) & "</s></m>", "//s" ),
       larr, INDEX( sarr, FILTER( SEQUENCE( ROWS( sarr ) ), ISNUMBER( FIND( "'selected': True", sarr ) ) ) ),
       s, FIND( "'label': '", larr)+10, f, FIND( "',", larr, s ),
       MID( larr, s, f-s )  )

Upvotes: 2

Related Questions