Reputation: 355
I have the following data frame in R after using melt
on some wide-format data:
Condition value
C1SSC 4.5
C2SSC 7.7
TC1SSC 6.0
TC2SSC 7.3
PC1SSC 4.5
PC2SSC 5.7
Each character or substring has a specific meaning (for instance, TC2SSC means a condition where a textured [T] circle [C] was viewed with both eyes [2], and the response 'starting shape' was a circle [SSC]).
What I want to do is generate new variable columns that are dependent on these characters and substrings - one for texture, one for shape and so on. I thought about using grepl
or substr
, but I'm not sure if these can evaluate specific parts of strings (i.e. when ascertaining shape, checking the first two characters to see if they contain a 'C').
Ideally, this is what I'd end up with (example for TC2SSC):
Texture Shape View startShape value
T Circle 2 Circle 4.5
There are a lot of useful functions, but I'm not sure which is the best to use here. Any advice would be much appreciated.
Upvotes: 1
Views: 348
Reputation: 118879
Here's a straightforward way to approach the problem. Basically, use a pattern with gsub
to insert a character after every character (here "_") that you want to "split" and then use strsplit
on it. Here's how:
split.df <- data.frame(do.call(rbind, strsplit(gsub("(C|SSC|[0-9]+)", "_\\1_",
dt$Condition), "[_]+")), stringsAsFactors=FALSE)
# X1 X2 X3 X4
# 1 C 1 SSC
# 2 C 2 SSC
# 3 T C 1 SSC
# 4 T C 2 SSC
# 5 P C 1 SSC
# 6 P C 2 SSC
Now, the rest is pretty straightforward (change names, convert classes and replace C to circle etc..)
names(split.df) <- c("Texture", "Shape", "View", "startShape")
split.df <- within(split.df, { Shape[Shape == "C"] <- "Circle"
View <- as.numeric(View)
startShape[startShape == "SSC"] <- "Circle"} )
cbind(split.df, value = df$value)
# Texture Shape View startShape df$value
# 1 Circle 1 Circle 4.5
# 2 Circle 2 Circle 7.7
# 3 T Circle 1 Circle 6.0
# 4 T Circle 2 Circle 7.3
# 5 P Circle 1 Circle 4.5
# 6 P Circle 2 Circle 5.7
Upvotes: 2