Reputation:
I have a variable in a R data frame with this format, are are have a mixed format:
98 (56:78)
34 (45:67)
45
67 (34:98)
I want do divide that column into 3 values,and to put each value in a new column of the data frame.
98 56 78
34 45 67
45 NA NA
67 34 98
how can I do that? thank you very much!
Upvotes: 0
Views: 165
Reputation: 96
A tidy solution using extract()
:
library(tidyverse)
df <- data_frame(a = c("98 (56:78)", "10"))
df %>%
extract(a, into = c("x", "y", "z"), regex = "(\\d*) ?\\(?(\\d*):?(\\d*)") %>%
mutate_at(vars(x, y, z), as.integer)
Which results in:
# A tibble: 2 x 3
x y z
<int> <int> <int>
1 98 56 78
2 10 NA NA
Upvotes: 0
Reputation: 206596
With your sample data
dd <- data.frame(a=c("98 (56:78)", "45"))
You can do this using the tidyverse (and the tidyr package in particular) using
library(tidyverse)
dd %>% separate(a, c("x","y","z"), extra="drop")
# x y z
# 1 98 56 78
# 2 45 <NA> <NA>
Upvotes: 3
Reputation: 38520
Here is a longer base R method that returns a nice looking data.frame.
setNames(do.call(rbind.data.frame,
regmatches(temp, regexec("(\\d+) \\((\\d+):(\\d+)\\)$", temp)))[-1],
paste0("V", 1:3))
Here, regexec
selects the desired numbers using the regular expression "(\d+) \((\d+):(\d+)\)$" and regmatches
returns a list of length equal to its inputs with each element a length of 4. This list is turned into a data.frame with do.call
and rbind.data.frame
and the first column, which contains the full string, is removed. Finally, setNames
replaces the ugly default names with more manageable ones.
This returns
V1 V2 V3
1 98 56 78
2 91 53 76
3 99 53 75
Note that the variables are factor variables in this context, which are usually not very useful. To convert these to numeric, you can use the dat[] <- lapply(dat, as.numeric)
paradigm.
data
temp <- c("98 (56:78)", "91 (53:76)", "99 (53:75)")
Upvotes: 0