Reputation: 7517
I have a small dataset written in SPSS syntax which comes from Table 5.3 p. 189 of this book (type 210
in the page slot to see the table).
I was wondering if there might be a way to convert this data to .csv
file? (I want to use the data in R
afterwards)
# SPSS Code:
DATA LIST FREE/gpid anx socskls assert.
BEGIN DATA.
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4
END DATA.
EDIT - in order to check answers I am adding here the actual way the data looks after reading it in SPSS :
gpid anx socskls assert
1 5 3 3
1 5 4 3
1 4 5 4
1 4 5 4
1 3 5 5
1 4 5 4
1 4 5 5
1 4 4 4
1 5 4 3
1 5 4 3
1 4 4 4
2 6 2 1
2 6 2 2
2 5 2 3
2 6 2 2
2 4 4 4
2 7 1 1
2 5 4 3
2 5 2 3
2 5 3 3
2 5 4 3
2 6 2 3
3 4 4 4
3 4 3 3
3 4 4 4
3 4 5 5
3 4 5 5
3 4 4 4
3 4 5 4
3 4 6 5
3 4 4 4
3 5 3 3
3 4 4 4
Upvotes: 2
Views: 110
Reputation: 42602
If I understand correctly, the 1st, 5th, 9th, and 13th column of the dataset belong to variable gpid
, the 2nd, 6th, 10th, and 14th column belong to variable anx
, and so on. So, we need to
Many roads lead to Rome.
This is what I would do using my favourite tools. In particular, this approach uses the feature of data.table::melt()
to reshape multiple measure columns simultaneously. There is no manual cleanup of the data section in a text editor required.
The resulting dataset result
can be used directly afterwards in any subsequent R
code as requested by the OP. There is no need to take a detour using a .csv
file (However, feel free to save result
as a .csv
file).
library(data.table)
library(magrittr)
cols <- c("gpid", "anx", "socskls", "assert")
raw <- fread(text = "
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4",
fill = TRUE)
mv <- colnames(raw) %>%
matrix(ncol = 4L, byrow = TRUE) %>%
as.data.table() %>%
setnames(new = cols)
result <- melt(raw, measure.vars = mv, na.rm = TRUE)[
order(rowid(variable))][
, variable := NULL]
result
gpid anx socskls assert 1: 1 5 3 3 2: 1 5 4 3 3: 1 4 5 4 4: 1 4 5 4 5: 1 3 5 5 6: 1 4 5 4 7: 1 4 5 5 8: 1 4 4 4 9: 1 5 4 3 10: 1 5 4 3 11: 1 4 4 4 12: 2 6 2 1 13: 2 6 2 2 14: 2 5 2 3 15: 2 6 2 2 16: 2 4 4 4 17: 2 7 1 1 18: 2 5 4 3 19: 2 5 2 3 20: 2 5 3 3 21: 2 5 4 3 22: 2 6 2 3 23: 3 4 4 4 24: 3 4 3 3 25: 3 4 4 4 26: 3 4 5 5 27: 3 4 5 5 28: 3 4 4 4 29: 3 4 5 4 30: 3 4 6 5 31: 3 4 4 4 32: 3 5 3 3 33: 3 4 4 4 gpid anx socskls assert
fread()
returns a data.table raw
with default column names V1
, V2
, ... V16
and with missing values filled with NA
mv
is a data.table which indicates which columns of raw
belong to each target variable:
mv
gpid anx socskls assert 1: V1 V2 V3 V4 2: V5 V6 V7 V8 3: V9 V10 V11 V12 4: V13 V14 V15 V16
This informations is used by melt()
. melt()
also removes rows with missing values from the resulting long format.
After reshaping, the rows are ordered by the variable number but need to be reordered in the original row order by using rowid(variable)
. Finally, the variable
column is removed.
Giving a second thought, here is a streamlined version of the code which skips the creation of mv
and uses data.table
chaining:
library(data.table)
cols <- c("gpid", "anx", "socskls", "assert")
result <- fread(
text = "
1 5 3 3 1 5 4 3 1 4 5 4 1 4 5 4
1 3 5 5 1 4 5 4 1 4 5 5 1 4 4 4
1 5 4 3 1 5 4 3 1 4 4 4
2 6 2 1 2 6 2 2 2 5 2 3 2 6 2 2
2 4 4 4 2 7 1 1 2 5 4 3 2 5 2 3
2 5 3 3 2 5 4 3 2 6 2 3
3 4 4 4 3 4 3 3 3 4 4 4 3 4 5 5
3 4 5 5 3 4 4 4 3 4 5 4 3 4 6 5
3 4 4 4 3 5 3 3 3 4 4 4",
fill = TRUE, col.names = rep(cols, 4L))[
, melt(.SD, measure.vars = patterns(cols), value.name = cols, na.rm = TRUE)][
order(rowid(variable))][
, variable := NULL][]
result
Here, the columns are renamed within the call to fread()
. In this case, duplicated column names are desirable (as opposed to the usual use case) because the patterns()
function in the subsequent call to melt()
use the duplicated column names to combine the columns which belong to one measure variable.
Upvotes: 1
Reputation: 73842
Using readLines
and some string manipulating tools.
tmp <- readLines("spss1.txt") ## read from .txt
tmp <- trimws(gsub("[A-Z/.]", "", tmp)) ## remove caps and specials
nm <- strsplit(tmp[[1]], " ")[[1]] ## split names
tmp <- unlist(strsplit(tmp[3:11], "\\s{2,}") ) ## split data blocks
Finally, splitting at the spaces gives the result.
dat <- setNames(
type.convert(do.call(rbind.data.frame, strsplit(tmp, "\\s"))),
nm)
dat
# gpid anx socskls assert
# 1 1 5 3 3
# 2 1 5 4 3
# 3 1 4 5 4
# 4 1 4 5 4
# 5 1 3 5 5
# 6 1 4 5 4
# 7 1 4 5 5
# 8 1 4 4 4
# 9 1 5 4 3
# 10 1 5 4 3
# 11 1 4 4 4
# 12 2 6 2 1
# 13 2 6 2 2
# 14 2 5 2 3
# 15 2 6 2 2
# 16 2 4 4 4
# 17 2 7 1 1
# 18 2 5 4 3
# 19 2 5 2 3
# 20 2 5 3 3
# 21 2 5 4 3
# 22 2 6 2 3
# 23 3 4 4 4
# 24 3 4 3 3
# 25 3 4 4 4
# 26 3 4 5 5
# 27 3 4 5 5
# 28 3 4 4 4
# 29 3 4 5 4
# 30 3 4 6 5
# 31 3 4 4 4
# 32 3 5 3 3
# 33 3 4 4 4
Note: Results in the same Wilks' lambda as @emily-kothe's method. Maybe the authors used different data or your manova method is flawed?
Upvotes: 1
Reputation: 872
This requires some manual clean-up in Notepad or similar to place the data in the right format. But essentially, this could be imported using the following
df <- data.frame(
gpid = c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,
2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3),
anx = c(5,5,4,4,3,4,4,4,5,5,4,6,6,5,6,
4,7,5,5,5,5,6,4,4,4,4,4,4,4,4,4,5,4),
socskls = c(3,4,5,5,5,5,5,4,4,4,4,2,2,2,2,
4,1,4,2,3,4,2,4,3,4,5,5,4,5,6,4,3,4),
assert = c(3,3,4,4,5,4,5,4,3,3,4,1,2,3,2,
4,1,3,3,3,3,3,4,3,4,5,5,4,4,5,4,3,4)
)
write.csv(df, "df.csv", row.names = F)
Note that the first 4 values (1, 5, 3, 3) are the gpid, anx, socskls, and assert values for row 1. Whereas the values 1, 5, 4, 3 which appear to be in the next column of the pasted data in SPSS syntax (i.e. the next 4 values reading the syntax left to right) are actually the values for participant 10.
Note: I'm assuming you don't have SPSS installed. If you did the easiest option would using SPSS syntax to create the dataset in SPSS and then just export to R.
Upvotes: 1