Reputation: 2341
I have a survey dataset with a (faulty) structure as follows:
library(data.table)
dat <- fread("q1 q2 ...1 ..2 q3 ..1
NA response other else response other
1 4 NA NA 1 NA")
dat
q1 q2 ...1 ..2 q3 ..1
1: NA response other else response other
2: 1 4 <NA> <NA> 1 <NA>
There are two problems with the data structure
...1
and ..2
(and sometimes more), belong to the question (q
) which comes before.What I did so far is to paste row 1 to the column names and then remove it:
names(dat) <- paste0(names(dat), dat[1,])
dat<- dat[-1,]
colnames(dat) <- sub("NA.*", "", colnames(dat))
q1 q2response ...1other ..2else q3response ..1other
1: 1 4 <NA> <NA> 1 <NA>
This however does not fix the ...1
and ..2
.
What would be the best way to replace these parts with the correct question?
Desired outcome:
q1 q2_response q2_other q2_else q3_response q3_other
1: 1 4 <NA> <NA> 1 <NA>
Upvotes: 1
Views: 39
Reputation: 6234
Here is a possible data.table
approach using zoo::na.locf()
to carry forward the missing question numbers:
library(data.table)
## carry forward question numbers
nms <- zoo::na.locf(replace(names(dat), grepl("\\.+", names(dat)), NA))
## append first row to question numbers
nms <- paste0(nms, ifelse(is.na(dat[1]), "", paste0("_", dat[1])))
setnames(dat, nms)
dat <- dat[-1]
dat
#> q1 q2_response q2_other q2_else q3_response q3_other
#> 1: 1 4 <NA> <NA> 1 <NA>
Upvotes: 1
Reputation: 47
Here is my solution
library(stringr)
library(dplyr)
CoLname <- data.frame(Names = c("q1","q2response", "...1other", "..2else", "q3response", "..1other"))
CoLname %>% mutate(Ques = str_extract(Names , pattern = "q[0-9]{1}"),
Ans = str_replace_all(Names , c("q[0-9]{1}" = "",
"^[\\.]+[0-9]+" ="")),
QuesGroup = ifelse(is.na(Ques), 0, 1),
QuesGroup = cumsum(QuesGroup)
) %>%
group_by(QuesGroup) %>%
mutate(NewCol = ifelse(Ans == "", Ques ,paste0(first(Ques), "_", Ans ))) %>% ungroup() -> CoLname
colnames(YourDataframe) <- CoLname$NewCol
Upvotes: 2