How do I find a substring between two characters using Prolog?

Question

I have a variable being passed to a predicate that is a list of strings.

From each string in the list I want to extract the substring between the deepest set of parentheses and create a list of all of those substrings.

For example:

Input:

["Canidae(Canis(C. lupus(C. l. familiaris)))", "Felidae(Felinae(Felis(F. catus)))", "Equidae(Equus(E. ferus(E. f. caballus)))"]

Ouput:

["C. l. familiaris", "F. catus", "E. f. caballus"]

(I have used biological classification ranks as an example since they are similar in structure to my actual data)

Finally, the depth of each set of parentheses is unknown and the deepest substring is always the only substring between an open parentheses and a closed parentheses.

Thanks for your help with this, I'm new to Prolog so the way of thinking is a little different. I've been trying to get my head around this problem for a while but I can't work it out.

willeM_ Van Onsem · Accepted Answer

Instead of solving the problem for a list, we better first solve the problem for a single string. In order to calculate the deepest substring, we can - if I understood the specifications correctly - calculate the position of the last opening parenthesis. We will assume you have transformed the string into a list of ASCII codes using for instance string_codes/2. Here the opening bracket has code 40:

last_opening(L,X) :-
    last_opening(L,0,0,X).

last_opening([],J,_,J).
last_opening([40|T],_,I,X) :-
    !,
    I1 is I+1,
    last_opening(T,I1,I1,X).
last_opening([_|T],J,I,X) :-
    I1 is I+1,
    last_opening(T,J,I1,X).

For example for your first example:

?- string_codes("Canidae(Canis(C. lupus(C. l. familiaris)))",L),last_opening(L,X).
L = [67, 97, 110, 105, 100, 97, 101, 40, 67|...],
X = 23.

It says we have to start extracting from position 23:

Canidae(Canis(C. lupus(C. l. familiaris)))
                       ^ here

Once we know where the deepest substring starts, we can extract the string: we simply have to stop at the end of the list, or at code 41, whatever comes first:

extract_substring(L,0,S) :-
    !,
    extract_substring2(L,S).
extract_substring([_|T],N,S) :-
    N1 is N-1,
    extract_substring(T,N1,S).

extract_substring2([],[]).
extract_substring2([41|_],[]) :-
    !.
extract_substring2([L|T],[L|U]) :-
    extract_substring2(T,U).

For example:

?- string_codes("Canidae(Canis(C. lupus(C. l. familiaris)))",L),last_opening(L,X),extract_substring(L,X,T),string_codes(St,T).
L = [67, 97, 110, 105, 100, 97, 101, 40, 67|...],
X = 23,
T = [67, 46, 32, 108, 46, 32, 102, 97, 109|...],
St = "C. l. familiaris".

Now we can write a predicate that does the string_codes calling automatically:

deepest_string(S,T) :-
    string_codes(S,CS),
    last_opening(CS,X),
    extract_substring(CS,X,CT),
    string_codes(T,CT).

For example:

?- deepest_string("Canidae(Canis(C. lupus(C. l. familiaris)))",L).
L = "C. l. familiaris".

To conclude we only have to implement the function over a list:

deepest_string_list([],[]).
deepest_string_list([S|ST],[T|TT]) :-
    deepest_string(S,T),
    deepest_string_list(ST,TT).

resulting in:

?- deepest_string_list(["Canidae(Canis(C. lupus(C. l. familiaris)))", "Felidae(Felinae(Felis(F. catus)))", "Equidae(Equus(E. ferus(E. f. caballus)))"],T).
T = ["C. l. familiaris", "F. catus", "E. f. caballus"].

If you want to alter the characters, you can simply lookup their ASCII equivalent and put these instead of 40 and 41.

How do I find a substring between two characters using Prolog?

Answers (2)

Related Questions