amisos55
amisos55

Reputation: 1979

Splitting string variables in R

I need to split my variables (string) into different columns. My data look like this:

test.data <- data.frame(id=c(101,101,101,101,101),
level=c( "levels p3 trunk slide.level", "levels p3 shark.level", 
"levels p3 wedge.level", "levels p3 tricky.level", "levels p4 annoying lever.level"),
badge=c( "springboard badge s", "lever badge s", "lever badge s", 
"ramp badge s", "lever badge s"))

> test.data
   id                          level               badge
1 101    levels p3 trunk slide.level springboard badge s
2 101          levels p3 shark.level       lever badge s
3 101          levels p3 wedge.level       lever badge s
4 101         levels p3 tricky.level        ramp badge s
5 101 levels p4 annoying lever.level       lever badge s

I need to split "level" variable into two variables [pp,level] and "badge" variable into two variables [item,badge].

My data should look like this:

> test.data
   id         PP              Level                   Item          Badge
1 101        levels p3        trunk slide.level       springboard   badge s
2 101        levels p3        shark.level             lever         badge s
3 101        levels p3        wedge.level             lever         badge s
4 101        levels p3        tricky.level            ramp          badge s
5 101        levels p4        annoying lever.level    lever         badge s

Please note that the test.data$level variable starts with a "space". I tried strsplit() function but could not solve it. Could anybody help on this?

Best.

Upvotes: 0

Views: 100

Answers (1)

akrun
akrun

Reputation: 887891

We can do this with a double extract from tidyr. For the 'level' column, we match a word (\\w+) followed by one or more white space (\\s+) followed by another word (\\w+), capture it as a group (wrap with parentheses ((...)) followed by one or more space (\\s+) and capture the rest of the characters ((.*)). Similarly, we can separate the other column into two with another regex

library(tidyr)
extract(test.data, level, into = c('pp', 'level'), '(\\w+\\s+\\w+)\\s+(.*)') %>% 
                 extract(badge, into = c('Item', 'Badge'), '(\\w+)\\s*(.*)')
#   id        pp                level        Item   Badge
#1 101 levels p3    trunk slide.level springboard badge s
#2 101 levels p3          shark.level       lever badge s
#3 101 levels p3          wedge.level       lever badge s
#4 101 levels p3         tricky.level        ramp badge s
#5 101 levels p4 annoying lever.level       lever badge s

Upvotes: 2

Related Questions