ejbs
ejbs

Reputation: 390

Split string into words in Prolog

I wrote the following code to split a CharList (list of integers) to a list of words. I've run it with trace on and it does all the work and puts it all in a list but it doesn't bind the WordList variable to the value.

splitintowords(CharList, WordList):- splitintowords_(CharList, [], [], WordList).
splitintowords_([],[], Res, Res).
splitintowords_([],Word, WordList,_):-
    \+ Word = [],
    append([Word], WordList, NWordList),
    splitintowords_([],[], NWordList,_).
splitintowords_([H|T],Word, WordList,_):-
    alphabetic(H),
    append([H], Word, NWord),
    splitintowords_(T,NWord, WordList,_).
splitintowords_([H|T],Word, WordList,_):-
    \+ alphabetic(H),
    append([Word], WordList, NWordList),
    splitintowords_(T, [], NWordList,_).

Test case using Swipl:

string_codes('hello there, how are you?',T), splitintowords(T,  W).

Here's an alternative implementation that I wrote first, this one doesn't work either:

splitintowords(CharList, WordList):-
    splitintowords_(CharList, [[]], WordList).
splitintowords_([], WordList, WordList).
splitintowords_([H|T], [Word|Words], _):-
    alphabetic(H),
    WordN = [H|Word],
    splitintowords_(T, [WordN|Words], _).
splitintowords_([H|T], Words):-
    \+ alphabetic(H),
    WordsN = [[]|Words],
    splitintowords(T, WordsN).

This one is simpler and seems to do the equivalent thing with trace.

Upvotes: 1

Views: 1165

Answers (3)

coder
coder

Reputation: 12972

For your first solution the problem is that though you build the right list, you pass in each step an anonymous variable '_' which unifies with anything, instead of the name of the list-variable you want to return in the end.

I tried your solution with build in predicate is_alpha/1 instead of alphabetic since you haven;t given the code, but it should be the same.... The code with some changes:

splitintowords(CharList, WordList):- splitintowords_(CharList, [], [], WordList).

splitintowords_([],[], Res, Res).
splitintowords_([],Word, WordList,L):-
    \+ Word = [],
    append([Word], WordList, NWordList),
    splitintowords_([],[], NWordList,L).
splitintowords_([H|T],Word, WordList,L):-
    is_alpha(H),
    append([H], Word, NWord),
    splitintowords_(T,NWord, WordList,L).
splitintowords_([H|T],Word, WordList,L):-
    \+ is_alpha(H),
    append([Word], WordList, NWordList),
    splitintowords_(T, [], NWordList,L).

Example:

 string_codes('hello there, how are you?',T), splitintowords(T,  W).
T = [104, 101, 108, 108, 111, 32, 116, 104, 101|...],
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104|...], [111, 108, 108|...]] ;
false.

Note that because the solution is very large it shows ...., you can see full solution by pressing w in swi-prolog:

?- string_codes('hello there, how are you?',T), splitintowords(T,  W).
T = [104, 101, 108, 108, 111, 32, 116, 104, 101|...],
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104|...], [111, 108, 108|...]] [write]
T = [104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 44, 32, 104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117, 63],
W = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ;
false.

(second T,W are the full solutions of first T,W).

Also with some similar changes you second solution:

splitintowords(CharList, WordList):-
    splitintowords_(CharList, [[]], WordList).

splitintowords_([], WordList, WordList).

splitintowords_([H|T], [Word|Words], L):-
    is_alpha(H),
    WordN = [H|Word],
    splitintowords_(T, [WordN|Words], L).

splitintowords_([H|T], Words,L):-
    \+ is_alpha(H),
    WordsN = [[]|Words],
    splitintowords_(T, WordsN,L).

Example:

?- string_codes('hello there, how are you?',T), splitintowords(T,  W).
T = [104, 101, 108, 108, 111, 32, 116, 104, 101, 114, 101, 44, 32, 104, 111, 119, 32, 97, 114, 101, 32, 121, 111, 117, 63],
W = [[], [117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ;
false.

Note that somewhere in the middle it has one empty list more than your previous solution. If you want to get rid of empty lists it would be very easy by writing one more predicate:

delete([],[]).
delete([H|T],[H|T1]):-dif(H,[]),delete(T,T1).
delete([[]|T],T1):-delete(T,T1).

Example:

?- delete([[], [117, 111, 121], [101, 114, 97], [119, 111, 104], [], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]],L).
L = [[117, 111, 121], [101, 114, 97], [119, 111, 104], [101, 114, 101, 104, 116], [111, 108, 108, 101, 104]] ;
false.

(It removed empty lists).

EDIT

I also tested it with your predicate alphabetic/1 and produces exactly the same output as with is_alpha/1.

Upvotes: 1

user1812457
user1812457

Reputation:

I must be misunderstanding the question. I will use chars instead of codes, to make it easier to see on the top level what is going on. If I understand the wording and the attempt to code it in Prolog, you are trying to do:

?- string_chars("hello there, how are you?", Chars),
   chars_words(Chars, Words).
Chars = [h, e, l, l, o, ' ', t, h, e|...],
Words = [[h, e, l, l, o], [t, h, e, r, e], [h, o, w], [a, r, e], [y, o, u]].

Here, you have taken all runs of consecutive alpha characters and put then together as words; and you have discarded all non-alpha characters in between them.

With this assumption, here is how you can approach it:

  1. strip leading non-alpha chars
  2. the consecutive alpha chars at the front are a word
  3. discard trailing non-alpha chars and go to 2

In Prolog:

chars_words([], []).
chars_words([C|Cs], Words) :-
        strip_non_alpha([C|Cs], Rest),
        chars_words_aux(Rest, Words).

chars_words_aux([], []).
chars_words_aux([C|Cs], [W|Ws]) :-
        word_rest([C|Cs], W, Rest0),
        strip_non_alpha(Rest0, Rest),
        chars_words_aux(Rest, Ws).

There is no need to use append/3 at any point, because the words you are not really changing the order of the chars from the original list to the list of words.

I have used a predicate called partition_sorted/4 to define both word_rest/3 and strip_non_alpha/2. It takes a list and splits it in two: a front for which a predicate succeeds, and a rest (the first element of the rest is the first element in the original list for which the predicate fails).

strip_non_alpha(List, Rest) :-
        partition_sorted(not_alpha, List, _, Rest).

word_rest(List, Word, Rest) :-
        partition_sorted(alpha, List, Word, Rest).

not_alpha(X) :- \+ char_type(X, alpha).
alpha(X) :- char_type(X, alpha).

Finally, a naive definition for partition_sorted/4:

partition_sorted(Goal, List, True, False) :-
        partition_sorted_aux(List, Goal, True, False).

partition_sorted_aux([], _, [], []).
partition_sorted_aux([X|Xs], Goal, True, False) :-
        (       call(Goal, X)
        ->      True = [X|True0],
                partition_sorted_aux(Xs, Goal, True0, False)
        ;       True = [], False = [X|Xs]
        ).

This definition is naive because it only works properly if Goal succeeds or fails once without leaving behind choice points, and the input list is ground.

Upvotes: 1

Floctioncers
Floctioncers

Reputation: 41

Here's my implementation using difference lists for better speed. It works pretty straighforwardly, takes a head from Chars, checks if H can be unified with any split char, if so, it creates a new word and add it into Bcu, if no, it will append the head to the end of Acu.

%split_words(+Chars, +Splits, -Words).
% it needs list of Chars, list of Split characters, and returns list of
% words.
split_words(Chars, Splits, Words) :-
  split_words(Chars, Splits, X - X, Y - Y, Words).
split_words([], _, [], Bcu, Words) :- listify(Bcu, Words).
split_words([], _, Acu, Bcu, Words) :-
  listify(Acu, W), add(Bcu, W, New), listify(New, Words).
split_words([H|C], Splits, Acu, Bcu, Words) :-
  member(H, Splits), !, listify(Acu, W), add(Bcu, W, New),
  split_words(C, Splits, X - X, New, Words).
split_words([H|C], Splits, Acu, Bcu, Words) :-
 add(Acu, H, New), split_words(C, Splits, New, Bcu, Words).

add(X - [I|Y], I, X - Y).

listify(X - [], X).

Upvotes: 1

Related Questions