gathering two files to another file

Question

i have two files file1.txt, file2.txt, I would like collected them in file3.txt after checking lines of two files.

example :

file1.txt

line 1    T
line 2    F
line 3    T
line 4    T
line 5    F
line 6    F

file2.txt

line 1    T
line 2    T
line 3    F
line 4    T
line 5    F
line 6    T

file3.txt

    file1 
    ********************
    number of line  = 6
    number of true  = 3
    number of false = 3
    ********************
    line 1    T
    line 3    T
    line 4    T

    file2
    ********************
    number of line  = 6
    number of true  = 4
    number of false = 2
    ********************
    line 1    T
    line 2    T
    line 4    T
    line 6    T

For each file, the resulting file should contain a header showing the number of lines, the number of lines that were true (T) and the number of lines that were false (F). Next only the lines that were true are printed.

some help ?

willeM_ Van Onsem · Accepted Answer

You can do this as follows. We first define a predicate merge/2 with the first item a list of filenames that must be read (here [file1.txt,file2.txt]), and the second parameter the name of the file to which you wish to write (here file3.txt). Now we define merge/2 as:

merge(Inp,Outp) :-
    open(Outp,write,Outs),
    mergeS(Inp,Outs).

So we open a file with the name Outp and the get the corresponding stream Outs, we then call mergeS/2.

There are two cases for mergeS/2:

all input files have been processed, so we can stop processing and close the stream:
```
mergeS([],OutS) :-
    close(OutS).
```
there is still at least one file we need to process:
```
mergeS([H|T],OutS) :-
    open(H,read,InS),
    atom_chars(FileN,H),
    process(FileN,InS,OutS),
    close(InS),
    mergeS(T,OutS).
```
The core of this predicate is evidently process/3, but in order to make things more convenient. We do the file handling already in mergeS.

Next our process/3 predicate reads:

process(Header,InS,OutS) :-
    get_lines(InS,Lin,NL,NT,NF),
    write(OutS,Header),nl(OutS),
    write(OutS,'********************'),nl(OutS),
    write(OutS,'number of line  = '),write(OutS,NL),nl(OutS),
    write(OutS,'number of true  = '),write(OutS,NT),nl(OutS),
    write(OutS,'number of false = '),write(OutS,NF),nl(OutS),
    write(OutS,'********************'),nl(OutS),
    print_lines(OutS,Lin),
    nl(OutS).

We first gather the content of the file with get_lines/5. This predicate will simultaneously calculate statistics like the number of lines, the number of trues and the number of false. Next we use a number of write/2 and nl/1 statements to write statistics to the output file and then use a predicate print_lines/2 that will write the content of the file to file3.txt:

get_lines and get_line

get_lines/5 uses three accumulators to calculate statistics. This is done by initializing three accumulators and then make a call to get_lines/8:

get_lines(Ins,Lin,NL,NT,NF) :-
    get_lines(Ins,Lin,0,0,0,NL,NT,NF).

get_lines/8 is a recursive function that processes one line at a time, determines whether the line is T or F, updates the accumulators, and will parse the next line. It is however possible that we are given an empty file. In order to make our approach more robust we thus write:

get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
    at_end_of_stream(InS),
    !.

and the recursive case:

get_lines(InS,[H|T],NL0,NT0,NF0,NL,NT,NF) :-
    get_line(InS,HCs),
    inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
    atom_chars(H,HCs),
    get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).

get_line/2 simply reads the next line of the file, and returns a stream of characters. It terminates at ' ' (inclusive, but not in the result).

get_line(InS,[]) :-
    at_end_of_stream(InS),
    !.
get_line(InS,[H|T]) :-
    get_char(InS,H),
    H \= '
',
    !,
    get_line(InS,T).
get_line(_,[]).

inspect/7

Now we still have to inspect our line with inspect/7. We again use three accumulators (as you could already guess based on the implementation of get_lines/8). First we fetch the last character of the line. If no such character exists, the statistics won't change (it could be possible somewhere an empty line is introduced). Otherwise we obtain the last character with last/2 and inspect it with inspect_last/7:

inspect(L,NL0,NT0,NF0,NL,NT,NF) :-
    last(L,LL),
    !,
    NL is NL0+1,
    inspect_last(LL,NT0,NF0,NT,NF).
inspect(_,NL0,NT,NF,NL1,NT,NF) :-
    !,
    NL1 is NL0+1.

inspect_last determines whether the last character is a T, F or something else, and updates the accumulators accordingly:

inspect_last('T',NT0,NF,NT1,NF) :-
    !,
    NT1 is NT0+1.
inspect_last('F',NT,NF0,NT,NF1) :-
    !,
    NF1 is NF0+1.
inspect_last(_,NT,NF,NT,NF).

print_lines/2:

finally we still need to print our file to the output. This is done using print_lines/2 which is rather straightforward:

print_lines(_,[]) :-
    !.
print_lines(OutS,[H|T]) :-
    write(OutS,H),
    nl(OutS),
    print_lines(OutS,T).

Full code

The full code:

merge(Inp,Outp) :-
    open(Outp,write,Outs),
    mergeS(Inp,Outs).

mergeS([],OutS) :-
    close(OutS).

mergeS([H|T],OutS) :-
    open(H,read,InS),
    atom_chars(FileN,H),
    process(FileN,InS,OutS),
    close(InS),
    mergeS(T,OutS).

process(Header,InS,OutS) :-
    get_lines(InS,Lin,NL,NT,NF),
    write(OutS,Header),nl(OutS),
    write(OutS,'********************'),nl(OutS),
    write(OutS,'number of line  = '),write(OutS,NL),nl(OutS),
    write(OutS,'number of true  = '),write(OutS,NT),nl(OutS),
    write(OutS,'number of false = '),write(OutS,NF),nl(OutS),
    write(OutS,'********************'),nl(OutS),
    print_lines(OutS,Lin),
    nl(OutS).

get_lines(Ins,Lin,NL,NT,NF) :-
    get_lines(Ins,Lin,0,0,0,NL,NT,NF).

get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
    at_end_of_stream(InS),
    !.
get_lines(InS,[H|T],NL0,NT0,NF0,NL,NT,NF) :-
    get_line(InS,HCs),
    inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
    atom_chars(H,HCs),
    get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).

get_line(InS,[]) :-
    at_end_of_stream(InS),
    !.
get_line(InS,[H|T]) :-
    get_char(InS,H),
    H \= '
',
    !,
    get_line(InS,T).
get_line(_,[]).

inspect(L,NL0,NT0,NF0,NL,NT,NF) :-
    last(L,LL),
    !,
    NL is NL0+1,
    inspect_last(LL,NT0,NF0,NT,NF).
inspect(_,NL0,NT,NF,NL1,NT,NF) :-
    !,
    NL1 is NL0+1.

inspect_last('T',NT0,NF,NT1,NF) :-
    !,
    NT1 is NT0+1.
inspect_last('F',NT,NF0,NT,NF1) :-
    !,
    NF1 is NF0+1.
inspect_last(_,NT,NF,NT,NF).

print_lines(_,[]) :-
    !.
print_lines(OutS,[H|T]) :-
    write(OutS,H),
    nl(OutS),
    print_lines(OutS,T).

If one now queries:

?- merge(["file1.txt","file2.txt"],"file3.txt").
true.

a file named file3.txt is constructed with content:

file1.txt
********************
number of line  = 6
number of true  = 3
number of false = 3
********************
line 1    T
line 2    F
line 3    T
line 4    T
line 5    F
line 6    F

file2.txt
********************
number of line  = 6
number of true  = 4
number of false = 2
********************
line 1    T
line 2    T
line 3    F
line 4    T
line 5    F
line 6    T

Which is more or less what you want. Please comment if further errors occur.

EDIT

Somehow I didn't get you wanted to filter the lines and only show the ones that are true (T). You can do so by simply modifying get_lines/7:

get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
    at_end_of_stream(InS),
    !.
get_lines(InS,Res,NL0,NT0,NF0,NL,NT,NF) :-
    get_line(InS,HCs),
    inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
    atom_chars(H,HCs),
    ( NT1 > NT0
    -> Res =[H|T]
    ; Res = T
    ),
    get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).

(added boldface where the content changed)

This is not the most elegant way to do it, but it is probably the shortest fix.

Now the file on my machine shows:

file1.txt
********************
number of line  = 6
number of true  = 3
number of false = 3
********************
line 1    T
line 3    T
line 4    T

file2.txt
********************
number of line  = 6
number of true  = 4
number of false = 2
********************
line 1    T
line 2    T
line 4    T
line 6    T

gathering two files to another file

Answers (2)

Full code

Related Questions