Reputation: 90
Prolog newbie here. In SWI Prolog, I'm trying to figure out how to parse a simple line of CSV reversibly, but I'm stuck. Here's what I've got:
csvstring1(S, L) :-
split_string(S, ',', ',', T),
maplist(atom_number, T, L).
csvstring2(S, L) :-
atomic_list_concat(T, ',', S),
maplist(atom_number, T, L).
% This one is the same except that maplist comes first.
csvstring3(S, L) :-
maplist(atom_number, T, L),
atomic_list_concat(T, ',', S).
Now csvstring1 and csvstring2 work in a "forward" manner:
?- csvstring1('1,2,3,4', L).
L = [1, 2, 3, 4].
?- csvstring2('1,2,3,4', L).
L = [1, 2, 3, 4].
But not csvstring3:
?- csvstring3('1,2,3,4', L).
ERROR: Arguments are not sufficiently instantiated
Moreover csvstring3 works in reverse, but not the other two predicates:
?- csvstring3(L, [1,2,3,4]).
L = '1,2,3,4'.
?- csvstring1(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated
?- csvstring2(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated
How can I combine these into a single predicate?
Upvotes: 1
Views: 128
Reputation: 28983
I don't know of a particularly newbie friendly way to do it which doesn't compromise somewhere. This is the easiest:
csvString_list(String, List) :-
ground(String),
atomic_list_concat(Temp, ',', String),
maplist(atom_number, Temp, List).
csvString_list(String, List) :-
ground(List),
maplist(atom_number, Temp, List),
atomic_list_concat(Temp, ',', String).
but it makes and leaves spurious choicepoints, which is mildly annoying.
This cuts the choicepoints which is nice when using it, but poor practise to get into without being aware of what that means:
csvString_list(String, List) :-
ground(String),
atomic_list_concat(Temp, ',', String),
maplist(atom_number, Temp, List),
!.
csvString_list(String, List) :-
ground(List),
maplist(atom_number, Temp, List),
atomic_list_concat(Temp, ',', String).
This uses if/else which is less code:
csvString_list(String, List) :-
ground(String) ->
(atomic_list_concat(Temp, ',', String), maplist(atom_number, Temp, List))
; (maplist(atom_number, Temp, List), atomic_list_concat(Temp, ',', String)).
but is logically bad and you should reify the branching with if_ which isn't builtin to SWI Prolog and is less simple to use.
Or you could write a grammar with a DCG, which is not newbie territory:
:- set_prolog_flag(double_quotes, chars).
:- use_module(library(dcg/basics)).
csvTail([N|Ns]) --> [','], number(N), csvTail(Ns).
csvTail([]) --> [].
csv([N|Ns]) --> number(N), csvTail(Ns).
e.g.
?- phrase(csv(Ns), "11,22,33,44,55").
Ns = [11, 22, 33, 44, 55]
?- phrase(csv([11, 22, 33, 44, 55]), String)
String = [49, 49, ',', 50, 50, ',', 51, 51, ',', 52, 52, ',', 53, 53]
but now you're back to it leaving spurious choicepoints while parsing and you have to deal with the historic split of strings/atoms/character codes in SWI Prolog; that list will unify with "11,22,33,44,55"
because of the double_quotes flag but it doesn't look like it will.
Upvotes: 0
Reputation: 60014
How can I combine these into a single predicate?
csvstring(S, L) :-
( ground(S)
-> atomic_list_concat(T, ',', S),
maplist(atom_number, T, L)
; maplist(atom_number, T, L),
atomic_list_concat(T, ',', S)
).
... micro test ...
?- csvstring('1,2,3,4', L).
L = [1, 2, 3, 4].
?- csvstring(L, [1,2,3,4]).
L = '1,2,3,4'.
Upvotes: 1
Reputation: 2422
Others have given some advice and a lot of code. With SWI-Prolog, to parse comma-separated integers, you would use library(dcg/basics) and library(dcg/high_order) to do that trivially:
?- use_module(library(dcg/basics)),
use_module(library(dcg/high_order)),
portray_text(true).
true.
?- phrase(sequence(integer, ",", Ns), `1,2,3,4`).
Ns = [1, 2, 3, 4].
?- phrase(sequence(integer, ",", [-7,6,42]), S).
S = `-7,6,42`.
Of course, if you are trying to parse real CSV files, you should be using a CSV parser. Here is a minimal example of reading a CSV file and writing its output as a TSV (tab-separated) file. If this is your input in a file called example.csv
:
$ cat example.csv
id,name,salary,department
1,john,2000,sales
2,Andrew,5000,finance
3,Mark,8000,hr
4,Rey,5000,marketing
5,Tan,4000,IT
You can read it from the file and write it with tabs as separators like this:
?- csv_read_file('example.csv', Data),
csv_write_file('example.tsv', Data).
Data = [row(id, name, salary, department),
row(1, john, 2000, sales),
row(2, 'Andrew', 5000, finance),
row(3, 'Mark', 8000, hr),
row(4, 'Rey', 5000, marketing),
row(5, 'Tan', 4000, 'IT')].
The library guesses the field separator from the filename extension. Here it correctly guessed that 'csv' means the comma "," and 'tsv' means the tab. We can make the tab explicitly visible with cat -t
.
$ cat example.tsv
id name salary department
1 john 2000 sales
2 Andrew 5000 finance
3 Mark 8000 hr
4 Rey 5000 marketing
5 Tan 4000 IT
$ cat -t example.tsv
id^Iname^Isalary^Idepartment^M
1^Ijohn^I2000^Isales^M
2^IAndrew^I5000^Ifinance^M
3^IMark^I8000^Ihr^M
4^IRey^I5000^Imarketing^M
5^ITan^I4000^IIT^M
Upvotes: 2
Reputation: 4438
split_string is not reversible. Can use DCG - here is a simple multi-line DCG parser for CSV:
% Nicer formatting
% https://www.swi-prolog.org/pldoc/man?section=flags
:- set_prolog_flag(answer_write_options, [quoted(true), portray(true), spacing(next_argument), max_depth(100), attributes(portray)]).
% Show lists of codes as text (if 3 chars or longer)
:- portray_text(true).
csv_lines([]) --> [].
% Newline after every line
csv_lines([H|T]) --> csv_fields(H), [10], csv_lines(T).
csv_fields([H|T]) --> csv_field(H), csv_field_end(T).
csv_field_end([]) --> [].
% Comma between fields
csv_field_end(T) --> [44], csv_fields(T).
csv_field([]) --> [].
csv_field([H|T]) -->
[H],
% Fields cannot contain comma, newline or carriage return
{ maplist(dif(H), [44, 10, 13]) },
csv_field(T).
To demonstrate reversibility:
% Note: z is char 122
?- phrase(csv_lines([[`def`, `cool`], [`abc`, [122]]]), Lines).
Lines = `def,cool\nabc,z\n` ;
false.
?- phrase(csv_lines(Fields), `def,cool\nabc,z\n`).
Fields = [[`def`, `cool`], [`abc`, [122]]] ;
false.
To parse the field contents and maintain reversibility, can use e.g. atom_codes.
Upvotes: 1