Reputation: 59
I have a dataframe currently in wide format that has pre- and post- intervention questionnaire scores. There are 848 patients with at least one questionnaire. I would like to look at the data to compare how the scores changed from pre- to post-intervention. However, I am struggling since that data is in wide format and the data is not uniform in terms of the number of questionnaires per patient. There is a variable that lists the instance of the questionnaire and the value of the questionnaire is another variable.
Here is an example of how it currently looks:
a=c('instance1','total1','instance2', 'total2', 'instance3', 'total3',
'instance4','total4', 'instance5','total5')
b=c('postop2', '5', 'postop1', '7', NA, NA, 'preop', '10', NA, NA)
c=c(NA, NA, 'preop', '3', NA, NA, 'postop1', '4', 'postop2', '3')
data.frame(rbind(a,b,c))
There are 848 rows of questionnaire data.
I need to calculate the differences from preop to postop questionnaires. What is the best way to reorganize this data so I can get these values? I am struggling since the instances do not line up between all the 848 patients.
Thanks for you help.
Upvotes: 0
Views: 88
Reputation: 1008
Not sure if this is what you are after, but try this data.table approach:
library(data.table)
DT <- data.table(rbind(b,c))
names(DT) <- a
DT2 <- melt(DT, measure = patterns("^instance", "^total"), value.name = c("instance", "total"))
DT2
variable instance total
1: 1 postop2 5
2: 1 NA NA
3: 2 postop1 7
4: 2 preop 3
5: 3 NA NA
6: 3 NA NA
7: 4 preop 10
8: 4 postop1 4
9: 5 NA NA
10: 5 postop2 3
EDIT:
If you want a summary you could also do this:
DT2$total <- as.numeric(DT2$total) # for some reason total is class character
DT2[, sum(total), by=instance]
instance V1
1: postop2 8
2: NA NA
3: postop1 11
4: preop 13
Upvotes: 2