Barttttt
Barttttt

Reputation: 253

Reading from file, using Scheme

I'm trying to read from a file using scheme, and to put it's content into a list.

The problem is how to remove the question mark, numbers, and just keeping the words. Should I use a loop to check each time? If not the case how can I get the content of the next word from the "read "?

I tried to solve it using this code but I can't find a way to call the "read" until getting the end of file;

(define Project
  (lambda (fileName)
    (if (null? fileName) 
        'error
        (readNext (open fileName) '()))))

(define readNext
  (lambda (fc tmp)
    (if (null? (read fc) "#<eof>")
        tmp
        (readNext fc (cons (read fc) tmp)))))

Upvotes: 11

Views: 16463

Answers (5)

Thomas Blankenhorn
Thomas Blankenhorn

Reputation: 256

Update: R7RS

Under its new-ish (2013) standard R7RS (PDF), Scheme now provides the function read-string, which standardizes and simplifies the answer to the question of this thread. Let's demonstrate it on a simple testfile called mydata.txt:

bash$ cat mydata.txt
"Hello world!"
Is this microphone on?
Testing 1 2 3...

To read the whole file into a single string, you can use read-string on a Scheme REPL like this:

> (read-string 100 (open-input-file "mydata.txt"))
"\"Hello world!\"\nIs this microphone on?\nTesting 1 2 3...\n"

That second line is, of course, the REPL showing you the string returned by 1read-string. Notice that the quotation marks are properly escaped, which was one of the issues addressed in Will's answer.

As an aside: the first argument to read-string represents the maximum number of bytes for it to read. Make sure to set it to a value that reflects the actual size of your own files, lest they get truncated.

Portability

I verified the above solution with the Chibi, Chicken, and Gauche implementations of Scheme. At least in theory, it should also work with every other R7RS-compliant Scheme. The website schemers.org maintains a table of implementations that claim compliance. I can't guarantee the accuracy of their claims, obviously.

Also potentially interesting

In addition to read-string, the R7RS sandard and its implementations also provide a read-bytevector function, which works the same way on binary files. You can use it to read a binary file into a byte-vector.

A final R7RS function to mention here is read-line, which reads a text file one at a time. Hence, if you wanted to read your file into a list of lines, like Python's readlines function, you can now implement a Scheme version of readlines like this:

(define (readlines file)
 (let ((infile (open-input-file file)))
   (let loop ((lines '())
              (next-line (read-line infile)))
    (if (eof-object? next-line)
        (begin (close-input-port infile) 
               (reverse lines))
        (loop (cons next-line  lines) 
              (read-line infile))))))

Let's test it in the REPL:

> (define ls (readlines "mydata.txt"))
> (car ls)
"\"Hello world!\""
> (cadr ls)
"Is this microphone on?"
> (caddr ls)
"Testing 1 2 3..."

I hope this update helps.

Upvotes: 1

to_the_crux
to_the_crux

Reputation: 267

Read lines from a file using "list-ec" from SRFI-42:

(use srfi-42) ; Chicken
  or
(require srfi/42) ; Racket

(define (file->lines filename)
  (call-with-input-file filename
    (lambda (p)
      (list-ec (:port line p read-line) line))))

Parsing a line using SRFI-13 and SRFI-14:

(use srfi-13) (use srfi-14) ; Chicken
  or
(require srfi/13) (require srfi/14) ; Racket

(string-tokenize "hi; ho")
("hi;" "ho")

(string-tokenize "hi; ho" char-set:letter)
("hi" "ho")

Upvotes: 3

Zelphir Kaltstahl
Zelphir Kaltstahl

Reputation: 6189

I don't know enough to say anything about portability like the other answer, but if you were to use Racket, it would be as simple as the following:

(file->lines "somefile")

Upvotes: 1

Will
Will

Reputation: 1108

The most recommended way to import text is to edit and save the file as a scheme file defining a variable:

(define data "the text in
mydata.scm here")

and then calling:

(load "mydata.scm")

Many times, not every data file can just be edited and saved as a scheme file, and while newlines are automatically escaped, double quotes cannot and this creates a problem when loading the file.

Some implementation specific techniques are:

;Chicken
(use utils)
(read-all "mydata.txt")

;Racket
(file->string "mydata.txt")

A more portable function is:

;works in chicken-csi and Racket
(define (readlines filename)
  (call-with-input-file filename
    (lambda (p)
      (let loop ((line (read-line p))
                 (result '()))
        (if (eof-object? line)
            (reverse result)
            (loop (read-line p) (cons line result)))))))

Running an executable compiled chicken-csc will give error due to read-line requiring an extra file.

The most portable way to read a file is this function:

;works in Chicken, Racket, SISC
;Read a file to a list of chars
(define (file->char_list path)
 (call-with-input-file path
   (lambda (input-port)
     (let loop ((x (read-char input-port)))
       (cond 
        ((eof-object? x) '())
        (#t (begin (cons x (loop (read-char input-port))))))))))

This function is reasonably fast and portable across implementations. All that is needed is to convert the char_list to a string.

The simplest way is:

;may not work if there is limit on arguments
(apply string (file->char_list "mydata.txt"))

The catch is some implementations have a limit on the number of arguments that can be passed to a function. A list of 2049 chars would not work in Chicken.

Another method is:

;works in Chicken, Racket
(foldr (lambda (x y) (string-append (string x) y)) "" (file->char_list "mydata.txt"))

The problems are: First, foldr is not universally recognized (SISC), though it could be defined. Second, this method is very slow due to appending each character.

I wrote the next two functions to slice up a list of chars into nested lists until the lowest level would not exceed a maximum argument count in Chicken. The third function traverses the nested char list and returns a string using string string-append:

(define (cleave_at n a)
  (cond
   ((null? a) '())
   ((zero? n) (list '() a))
   (#t 
    ((lambda (x)
      (cons (cons (car a) (car x)) (cdr x)))
     (cleave_at (- n 1) (cdr a))))))

(define (cleave_binary_nest n a)
 (cond
  ((equal? n (length a)) (list a))
  (#t 
   ((lambda (x)
     (cond
      ((> (length (car x)) n) (map (lambda (y) (cleave_binary_nest n y)) x))
      (#t x)))
    (cleave_at (floor (/ (length a) 2)) a)))))

(define (binary_nest_char->string a)
 (cond
  ((null? a) "")
  ((char? (car a)) (apply string a))
  (#t (string-append
    (binary_nest_char->string (car a)) (binary_nest_char->string (cdr a))))))

The function is called like this:

;Works in Racket, Chicken, SISC
;faster than foldr method (3x faster interpreted Chicken) (30x faster compiled Chicken) (125x faster Racket gui)
(binary_nest_char->string (cleave_binary_nest 2048 (file->char_list "mydata.txt")))

To reduce to alphabetic characters and space there are two more functions:

(define (alphaspace? x)
 (cond
  ((and (char-ci>=? x #\a) (char-ci<=? x #\z)) #t)
  ((equal? x #\space) #t)
  (#t #f)))

(define (filter pred lis)
  ; if lis is empty
  (if (null? lis)
    ; return an empty list
    '()
    ; otherwise, if the predicate is true on the first element
    (if (pred (car lis))
      ; return the first element concatenated with the
      ; result of calling filter on the rest of lis
      (cons (car lis) (filter pred (cdr lis)))
      ; otherwise (if the predicate was false) just
      ; return the result of filtering the rest of lis
      (filter pred (cdr lis)))))

(define data (file->char_list "mydata.txt"))
(define data_alphaspace (filter alphaspace? data))
(define result (binary_nest_char->string (cleave_binary_nest 2048 data_alphaspace)))

This works on Racket, Chicken (interpreted and compiled), and SISC (Java). Each of those dialects should also work on Linux, Mac (OS X), and Windows.

Upvotes: 15

GoZoner
GoZoner

Reputation: 70235

Maybe this will get you started.

(define (file->list-of-chars file)
  (with-input-from-file file
    (lambda ()
      (let reading ((chars '()))
        (let ((char (read-char)))
          (if (eof-object? char)
              (reverse chars)
              (reading (cons char chars))))))))

Upvotes: 5

Related Questions