luser
luser

Reputation: 355

Creating values in separate columns that are dependent on different substrings

I have the following data frame in R after using melt on some wide-format data:

Condition value
C1SSC     4.5
C2SSC     7.7
TC1SSC    6.0
TC2SSC    7.3
PC1SSC    4.5
PC2SSC    5.7

Each character or substring has a specific meaning (for instance, TC2SSC means a condition where a textured [T] circle [C] was viewed with both eyes [2], and the response 'starting shape' was a circle [SSC]).

What I want to do is generate new variable columns that are dependent on these characters and substrings - one for texture, one for shape and so on. I thought about using grepl or substr, but I'm not sure if these can evaluate specific parts of strings (i.e. when ascertaining shape, checking the first two characters to see if they contain a 'C').

Ideally, this is what I'd end up with (example for TC2SSC):

Texture    Shape    View    startShape    value
T          Circle   2       Circle        4.5

There are a lot of useful functions, but I'm not sure which is the best to use here. Any advice would be much appreciated.

Upvotes: 1

Views: 348

Answers (1)

Arun
Arun

Reputation: 118879

Here's a straightforward way to approach the problem. Basically, use a pattern with gsub to insert a character after every character (here "_") that you want to "split" and then use strsplit on it. Here's how:

split.df <- data.frame(do.call(rbind, strsplit(gsub("(C|SSC|[0-9]+)", "_\\1_", 
                      dt$Condition), "[_]+")), stringsAsFactors=FALSE)

#   X1 X2 X3  X4
# 1     C  1 SSC
# 2     C  2 SSC
# 3  T  C  1 SSC
# 4  T  C  2 SSC
# 5  P  C  1 SSC
# 6  P  C  2 SSC

Now, the rest is pretty straightforward (change names, convert classes and replace C to circle etc..)

names(split.df) <- c("Texture", "Shape", "View", "startShape")
split.df <- within(split.df, { Shape[Shape == "C"] <- "Circle" 
            View <- as.numeric(View)
            startShape[startShape == "SSC"] <- "Circle"} )
cbind(split.df, value = df$value)

#   Texture  Shape View startShape df$value
# 1         Circle    1     Circle      4.5
# 2         Circle    2     Circle      7.7
# 3       T Circle    1     Circle      6.0
# 4       T Circle    2     Circle      7.3
# 5       P Circle    1     Circle      4.5
# 6       P Circle    2     Circle      5.7

Upvotes: 2

Related Questions