exl
exl

Reputation: 1853

converting data from tabular (.csv) format into text files

I am trying to figure out how to convert a table of text bank data (multiple choice questions) into a standard format where each question is it's own separate .Rnw file. This allows me to create a test bank that I can use with the exams package for R to create different exams in written or computer presented formats.

I have test bank data in tabular format (.csv) Where the structured data looks like this (delimited by semicolon):

question.no;question.text;choice.a;choice.b;choice.c;choice.d;choice.e;answer;label.1;label.2
1;This is the question text of 1;text of choice a;text of choice b;text of choice c;text of choice d;text of choice e;A;question.type.1;question.type.2
2;This is the question text of 2;text of choice a;text of choice b;text of choice c;text of choice d;text of choice e;A;question.type.1;question.type.2

What I would like to do is to parse this file to create a separate .Rnw file for each row of data, where the output for row 1 would be:

\begin{question}
This is the question text of 1
\begin{answerlist} 
\item text of choice a
\item text of choice b
\item text of choice c
\item text of choice d
\item text of choice e
\end{answerlist}
\end{question}

\begin{solution}
The right answer is A
\end{solution}

\exname{defaggdemand}
\extype{schoice}
% \label.1{question.type.1}
% \label.2{question.type.2}
% \exsolution{10000}
\exshuffle{TRUE}

And this file would be named "question_1.Rnw", and the output for row 2 would look analogously like this:

\begin{question}
This is the question text of 2 
\begin{answerlist} 
\item text of choice a
\item text of choice b
\item text of choice c
\item text of choice d
\item text of choice e
\end{answerlist}
\end{question}

\begin{solution}
The right answer is A
\end{solution}

\exname{defaggdemand}
\extype{schoice}
% \label.1{question.type.1}
% \label.2{question.type.2}
% \exsolution{10000}
\exshuffle{TRUE}

and this file called be called "question_2.Rnw", based on the first column of the .csv data.

The idea is that the strategy would take a large .csv table as input, and output to a directory one .Rnw file per row of testbank data, translating the data from the csv into a directory of testbank questions ready to use with the exams package.

I have used text parsing approaches like sed or regular expressions to repair a printed text bank set of questions before to do this, but this is the first time I have testbank data in such a structured, uniform format.

I'm sure that I could kludge together some kind of text substitution approach that would take each delimiter and substitute the right text and set of line returns, but this seems error prone, and I suspect there is a most elegant way.

I would appreciate any pointers on how to figure out how to do this.

Upvotes: 0

Views: 111

Answers (3)

maverick928
maverick928

Reputation: 41

Here is how you could do it in python.

Basically, you read the file line by line. Ignore first line since it seems to be just column descriptions. From second line on, split each line on the delimiter. Assign the list values to bunch of variables to refer to later. Open a new file to write to. Use f.write option to write out your template combined with the variables saved above.

with open("q-and-a-sheet-template.csv", "r") as infile:
    next(infile)
    filecount = 1
    for line in infile:
        if line:
            num, question_text, choice_a, choice_b, choice_c, choice_d, choice_e, answer, tag1, tag2 = line.split(';')
            outfile = "outfile"+str(filecount)+".rnw"
            with open(outfile, "a") as f:
                f.write("\\begin{question}\n")
                f.write(question_text+"\n")
                f.write("\\begin{answerlist}\n")
                f.write("\\"+choice_a+"\n")
                f.write("\\"+choice_b+"\n")
                f.write("\\"+choice_c+"\n")
                f.write("\\"+choice_d+"\n")
                f.write("\\"+choice_e+"\n")
                f.write("\\end{answerlist}\n")
                f.write("\\end{question}\n")
                f.write("\n")
                f.write("\\begin{solution}\n")
                f.write("The right answer is" + answer +"\n")
                f.write("\\end{solution}\n")
                f.write("\n")
                f.write("\\exname{defaggdemand}\n")
                f.write("\\extype{schoice}\n")
                f.write("% \\label.1{"+tag1+"}\n")
                f.write("% \\label.1{"+tag2+"}\n")
                f.write("% \\exsolution{10000}\n")
                f.write("\\exshuffle{TRUE}")
        filecount += 1

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204015

$ cat tst.awk
BEGIN { FS=";" }
NR>1 {
    out = "\\begin{question}"
    out = out ORS $2
    for (i=3; i<=7; i++) {
        out = out ORS "\\item " $i
    }
    out = out ORS "\\end{answerlist}"
    out = out ORS "\\end{question}"
    out = out ORS
    out = out ORS "\\begin{solution}"
    out = out ORS "The right answer is " $(i++)
    out = out ORS "\\end{solution}"
    out = out ORS
    out = out ORS "\\exname{defaggdemand}"
    out = out ORS "\\extype{schoice}"
    c=0
    for (; i<=NF; i++) {
        out = out ORS "% \\label." ++c "{" $i "}"
    }
    out = out ORS "\\exsolution{10000}"
    out = out ORS "\\exshuffle{TRUE}"
    print out " > " ("question_" NR-1 ".Rnw")
    close("question_" NR-1 ".Rnw")
}

.

$ awk -f tst.awk file
\begin{question}
This is the question text of 1
\item text of choice a
\item text of choice b
\item text of choice c
\item text of choice d
\item text of choice e
\end{answerlist}
\end{question}

\begin{solution}
The right answer is A
\end{solution}

\exname{defaggdemand}
\extype{schoice}
% \label.1{question.type.1}
% \label.2{question.type.2}
\exsolution{10000}
\exshuffle{TRUE} > question_1.Rnw
\begin{question}
This is the question text of 2
\item text of choice a
\item text of choice b
\item text of choice c
\item text of choice d
\item text of choice e
\end{answerlist}
\end{question}

\begin{solution}
The right answer is A
\end{solution}

\exname{defaggdemand}
\extype{schoice}
% \label.1{question.type.1}
% \label.2{question.type.2}
\exsolution{10000}
\exshuffle{TRUE} > question_2.Rnw
$

Just change " > " to >.

Upvotes: 1

choroba
choroba

Reputation: 241988

Perl to the rescue!

A template for questions is kept in the DATA section. Text::CSV_XS is used to process the csv. The first line of the csv is skipped (where the first column contains question.no), other lines are used to populate the template - each %1, %2, etc. are replaced with the corresponding column value. The result is saved to the file whose name was created from the first column.

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV_XS qw{ csv };

my $template = do { local $/; <DATA> };

csv(in       => shift,
    sep_char => ';',
    out      => \ 'skip',
    on_in    => sub {
        return if 'question.no' eq $_[1][0];
        open my $out, '>', "question_$_[1][0].Rnw" or die $!;
        ( my $output = $template ) =~ s/%([0-9])/$_[1][$1]/g;
        print {$out} $output;
        close $out;
});

__DATA__
\begin{question}
%1
\begin{answerlist}
\item %2
\item %3
\item %4
\item %5
\item %6
\end{answerlist}
\end{question}

\begin{solution}
The right answer is %7
\end{solution}

\exname{defaggdemand}
\extype{schoice}
% \label.1{%8}
% \label.2{%9}
% \exsolution{10000}
\exshuffle{TRUE}

Upvotes: 2

Related Questions