Reputation: 253
I'm trying to read from a file using scheme, and to put it's content into a list.
The problem is how to remove the question mark, numbers, and just keeping the words. Should I use a loop to check each time? If not the case how can I get the content of the next word from the "read "?
I tried to solve it using this code but I can't find a way to call the "read" until getting the end of file;
(define Project
(lambda (fileName)
(if (null? fileName)
'error
(readNext (open fileName) '()))))
(define readNext
(lambda (fc tmp)
(if (null? (read fc) "#<eof>")
tmp
(readNext fc (cons (read fc) tmp)))))
Upvotes: 11
Views: 16463
Reputation: 256
Update: R7RS
Under its new-ish (2013) standard R7RS (PDF), Scheme now provides the function read-string
, which standardizes and simplifies the answer to the question of this thread. Let's demonstrate it on a simple testfile called mydata.txt
:
bash$ cat mydata.txt
"Hello world!"
Is this microphone on?
Testing 1 2 3...
To read the whole file into a single string, you can use read-string
on a Scheme REPL like this:
> (read-string 100 (open-input-file "mydata.txt"))
"\"Hello world!\"\nIs this microphone on?\nTesting 1 2 3...\n"
That second line is, of course, the REPL showing you the string returned by 1read-string
. Notice that the quotation marks are properly escaped, which was one of the issues addressed in Will's answer.
As an aside: the first argument to read-string
represents the maximum number of bytes for it to read. Make sure to set it to a value that reflects the actual size of your own files, lest they get truncated.
Portability
I verified the above solution with the Chibi, Chicken, and Gauche implementations of Scheme. At least in theory, it should also work with every other R7RS-compliant Scheme. The website schemers.org maintains a table of implementations that claim compliance. I can't guarantee the accuracy of their claims, obviously.
Also potentially interesting
In addition to read-string
, the R7RS sandard and its implementations also provide a read-bytevector
function, which works the same way on binary files. You can use it to read a binary file into a byte-vector.
A final R7RS function to mention here is read-line
, which reads a text file one at a time. Hence, if you wanted to read your file into a list of lines, like Python's readlines
function, you can now implement a Scheme version of readlines
like this:
(define (readlines file)
(let ((infile (open-input-file file)))
(let loop ((lines '())
(next-line (read-line infile)))
(if (eof-object? next-line)
(begin (close-input-port infile)
(reverse lines))
(loop (cons next-line lines)
(read-line infile))))))
Let's test it in the REPL:
> (define ls (readlines "mydata.txt"))
> (car ls)
"\"Hello world!\""
> (cadr ls)
"Is this microphone on?"
> (caddr ls)
"Testing 1 2 3..."
I hope this update helps.
Upvotes: 1
Reputation: 267
Read lines from a file using "list-ec" from SRFI-42:
(use srfi-42) ; Chicken
or
(require srfi/42) ; Racket
(define (file->lines filename)
(call-with-input-file filename
(lambda (p)
(list-ec (:port line p read-line) line))))
Parsing a line using SRFI-13 and SRFI-14:
(use srfi-13) (use srfi-14) ; Chicken
or
(require srfi/13) (require srfi/14) ; Racket
(string-tokenize "hi; ho")
("hi;" "ho")
(string-tokenize "hi; ho" char-set:letter)
("hi" "ho")
Upvotes: 3
Reputation: 6189
I don't know enough to say anything about portability like the other answer, but if you were to use Racket, it would be as simple as the following:
(file->lines "somefile")
Upvotes: 1
Reputation: 1108
The most recommended way to import text is to edit and save the file as a scheme file defining a variable:
(define data "the text in
mydata.scm here")
and then calling:
(load "mydata.scm")
Many times, not every data file can just be edited and saved as a scheme file, and while newlines are automatically escaped, double quotes cannot and this creates a problem when loading the file.
Some implementation specific techniques are:
;Chicken
(use utils)
(read-all "mydata.txt")
;Racket
(file->string "mydata.txt")
A more portable function is:
;works in chicken-csi and Racket
(define (readlines filename)
(call-with-input-file filename
(lambda (p)
(let loop ((line (read-line p))
(result '()))
(if (eof-object? line)
(reverse result)
(loop (read-line p) (cons line result)))))))
Running an executable compiled chicken-csc will give error due to read-line requiring an extra file.
The most portable way to read a file is this function:
;works in Chicken, Racket, SISC
;Read a file to a list of chars
(define (file->char_list path)
(call-with-input-file path
(lambda (input-port)
(let loop ((x (read-char input-port)))
(cond
((eof-object? x) '())
(#t (begin (cons x (loop (read-char input-port))))))))))
This function is reasonably fast and portable across implementations. All that is needed is to convert the char_list to a string.
The simplest way is:
;may not work if there is limit on arguments
(apply string (file->char_list "mydata.txt"))
The catch is some implementations have a limit on the number of arguments that can be passed to a function. A list of 2049 chars would not work in Chicken.
Another method is:
;works in Chicken, Racket
(foldr (lambda (x y) (string-append (string x) y)) "" (file->char_list "mydata.txt"))
The problems are: First, foldr is not universally recognized (SISC), though it could be defined. Second, this method is very slow due to appending each character.
I wrote the next two functions to slice up a list of chars into nested lists until the lowest level would not exceed a maximum argument count in Chicken. The third function traverses the nested char list and returns a string using string string-append:
(define (cleave_at n a)
(cond
((null? a) '())
((zero? n) (list '() a))
(#t
((lambda (x)
(cons (cons (car a) (car x)) (cdr x)))
(cleave_at (- n 1) (cdr a))))))
(define (cleave_binary_nest n a)
(cond
((equal? n (length a)) (list a))
(#t
((lambda (x)
(cond
((> (length (car x)) n) (map (lambda (y) (cleave_binary_nest n y)) x))
(#t x)))
(cleave_at (floor (/ (length a) 2)) a)))))
(define (binary_nest_char->string a)
(cond
((null? a) "")
((char? (car a)) (apply string a))
(#t (string-append
(binary_nest_char->string (car a)) (binary_nest_char->string (cdr a))))))
The function is called like this:
;Works in Racket, Chicken, SISC
;faster than foldr method (3x faster interpreted Chicken) (30x faster compiled Chicken) (125x faster Racket gui)
(binary_nest_char->string (cleave_binary_nest 2048 (file->char_list "mydata.txt")))
To reduce to alphabetic characters and space there are two more functions:
(define (alphaspace? x)
(cond
((and (char-ci>=? x #\a) (char-ci<=? x #\z)) #t)
((equal? x #\space) #t)
(#t #f)))
(define (filter pred lis)
; if lis is empty
(if (null? lis)
; return an empty list
'()
; otherwise, if the predicate is true on the first element
(if (pred (car lis))
; return the first element concatenated with the
; result of calling filter on the rest of lis
(cons (car lis) (filter pred (cdr lis)))
; otherwise (if the predicate was false) just
; return the result of filtering the rest of lis
(filter pred (cdr lis)))))
(define data (file->char_list "mydata.txt"))
(define data_alphaspace (filter alphaspace? data))
(define result (binary_nest_char->string (cleave_binary_nest 2048 data_alphaspace)))
This works on Racket, Chicken (interpreted and compiled), and SISC (Java). Each of those dialects should also work on Linux, Mac (OS X), and Windows.
Upvotes: 15
Reputation: 70235
Maybe this will get you started.
(define (file->list-of-chars file)
(with-input-from-file file
(lambda ()
(let reading ((chars '()))
(let ((char (read-char)))
(if (eof-object? char)
(reverse chars)
(reading (cons char chars))))))))
Upvotes: 5