killajoule
killajoule

Reputation: 3832

Parsing numerical data using Prolog?

I am new to prolog and am considering using it for a small data analysis application. Here is what I am seeking to accomplish:

I have a CSV file with some data of the following from:

a,b,c
d,e,f
g,h,i
...

The data is purely numerical and I need to do the following: 1st, I need to group rows according to the following scheme:

enter image description here

So what's going on above?

I start at the 1st row, which has value 'a' in column one. Then, I keep going down the rows until I hit a row whose value in column one differs from 'a' by a certain amount, 'z'. The process is then repeated, and many "groups" are formed after the process is complete.

For each of these groups, I want to find the mean of columns two and three (as an example, for the 1st group in the picture above, the mean of column two would be: (b+e+h)/3).

I am pretty sure this can be done in prolog. However, I have 50,000+ rows of data and since prolog is declarative, I am not sure how efficient prolog would be at accomplishing the above task?

Is it feasible to work out a prolog program to accomplish the above task, so that efficiency of the program is not significantly lower than a procedural analog?

Upvotes: 1

Views: 242

Answers (1)

CapelliC
CapelliC

Reputation: 60014

this snippet could be a starting point for your task

:- [library(dcg/basics)].

rownum(Z, AveList) :- phrase_from_file(row_scan(Z, [], [], AveList), 'numbers.txt').

row_scan(Z, Group, AveSoFar, AveList) -->
    number(A),",",number(B),",",number(C),"\n",
    { row_match(Z, A,B,C, Group,AveSoFar, Group1,AveUpdated) },
    row_scan(Z, Group1, AveUpdated, AveList).
row_scan(_Z, _Group, AveList, AveList) --> "\n";[].

% row_match(Z, A,B,C, Group,Ave, Group1,Ave1) 
row_match(_, A,B,C, [],Ave, [(A,B,C)],Ave).
row_match(Z, A,B,C, [H|T],Ave, Group1,Ave1) :-
    H = (F,_,_),
    (  A - F =:= Z
    -> aggregate_all(agg(count,sum(C2),sum(C3)),
        member((_,C2,C3), [(A,B,C), H|T]), agg(Count,T2,T3)),
       A2 is T2/Count, A3 is T3/Count,
       Group1 = [], Ave1 = [(A2,A3)|Ave]
    ;  Group1 = [H,(A,B,C)|T], Ave1 = Ave
    ).

with this input

1,2,3
4,5,6
7,8,9
10,2,3
40,5,6
70,8,9
16,0,0

yields

?- rownum(6,L).
L = [ (3.75, 4.5), (5, 6)] 

Upvotes: 1

Related Questions