Reputation: 163
First of all I would like to apologise in adavnce if I won't be very clear in my question. I'm totally new to R and my terminology won't be that good.
We get a SPSS file from an external company that contains survey data. We have an R script to extract the data and write it into a CSV file. This works fine.
The second part of the script build a INI-style file for all the possible aswers. As an example, for the AGE we would have something like
[ AGE ]
1 = Under 13
2 = 13 - 15
3 = 15 - 25
4 = 25+
The CSV file will have one of 1, 2, 3 or 4 for each line. Until recently all possible answers were numbered starting with 1, but now some of them start from 0. Therefore we would like to have something like:
[ AGE ]
0 = Under 13
1 = 13 - 15
2 = 15 - 25
3 = 25+
The following is the current R code that we use. I know where it goes wrong, but I don't know how to correct it.
data<-read.spss(inputFile, to.data.frame=TRUE);
fileOut<- file(valuesExportFile, "w");
for (name in names(data)) {
cat("[", name,"]\n", file=fileOut);
variableValues<-levels(data[[name]]);
numberOfValues<-nlevels(data[[name]]);
if (numberOfValues > 0) {
for (i in 1:numberOfValues) {
cat(i, '= "', variableValues[i], '"', "\n", file=fileOut);
}
}
};
close(fileOut);
I have spent a day and a half googling and trying various approach. I did find a perl script, spssread.pl, that extract the data as we want it, but for some reason all the labels names are in uppercase, which is not acceptable as they are case-sensitive. I will keep looking at this script, but in the meantime I would like to see if there is a solution using R, since this is what we use already and it would be nice to have everything in one script.
So, any suggestions?
Upvotes: 3
Views: 1411
Reputation: 163
Thanks to Brian Diggs I was able to explore another way and I find a solution, although not a perfect one.
My solution was to extract the data with the use.value.labels=FALSE
and then unclass the variable and use the value.labels
attribute. I think showing the code would be clearer than me trying to explain it.
data<-read.spss(inputFile, to.data.frame=TRUE, use.value.labels=FALSE);
fileOut<- file(valuesExportFile, "w");
for (name in names(data)) {
cat("[", name,"]\n", file=fileOut);
variables<-attr(unclass(data[[name]]), "value.labels");
for (label in names(variables)) {
cat(variables[[label]], '= "', label, '"', "\n", file=fileOut);
}
};
close(fileOut);
The result
[ AGE ]
8 = " 65+ "
7 = " 55 to 64 "
6 = " 45 to 54 "
5 = " 35 to 44 "
4 = " 25 to 34 "
3 = " 21 to 24 "
2 = " 16 to 20 "
1 = " 13 to 15 "
0 = " Under 13 "
although workable, is not ideal. Does anyone know how I could sort them so to have
[ AGE ]
0 = " Under 13 "
1 = " 13 to 15 "
2 = " 16 to 20 "
3 = " 21 to 24 "
4 = " 25 to 34 "
5 = " 35 to 44 "
6 = " 45 to 54 "
7 = " 55 to 64 "
8 = " 65+ "
EDIT: 04/05/12
After some more help from Brian Diggs (see the comments) the final solutions is
data<-read.spss(inputFile, to.data.frame=TRUE, use.value.labels=FALSE);
fileOut<- file(valuesExportFile, "w");
for (name in names(data)) {
cat("[", name,"]\n", file=fileOut);
variables<-attr(unclass(data[[name]]), "value.labels");
variables<-variables[order(as.numeric(variables))];
for (label in names(variables)) {
cat(variables[[label]], '= "', label, '"', "\n", file=fileOut);
}
};
close(fileOut);
Upvotes: 2