Reputation: 11
Given a data frame where each observation in a column is a string of the form "x~y" where x and y are integers.
The goal is to transform the "x~y" string into a vector, c(x..y) which is a sequence of numbers that start with integer x and end with integer y.
Finally, the data frame needs to be unnested so that each element of the vector gets its own row and the other columns are properly repeated.
For example, here's a data frame:
A B
A1 -1~1
A2 1~3
A3 2~4
The above data frame should be changed to the following:
A B
A1 -1
A1 0
A1 1
A2 1
A2 2
A2 3
A3 2
A3 3
A3 4
It is impossible to set the str_replace examples as there are lots of cases.. How do I make this code??
Upvotes: 0
Views: 731
Reputation: 3115
Since your B column can be easily transformed into an expression that gives you what you want, I would use the following approach.
# Using tidyverse for stringr (str_replace), tidyr (unnest), and purrr (map)
library(tidyverse)
# recreating your dataframe
df <- data.frame(A=c("A1","A2","A3"),B=c("-1~1","1~3","2~4"), stringsAsFactors = FALSE)
This solution has three parts. First transform the rows in the B column to seq expressions. So that "x~y" becomes "seq(x,y,by=1)".
df$B <- str_replace(df$B,"\\~",",")
df$B <- paste("seq(",df$B,",by=1)")
One of the nice things about R is that if you can generate strings containing R expressions, you can then evaluate them with "eval(parse())" like this...
df$B <- map(df$B, ~ eval(parse(text=.)))
Alternatively, you could have given the map() call a function that would take your original "x~y" character strings and return the vector of integers that you want but I think this solution has the least typing (I think?).
However you've done it, you now have a B column where each observation is an integer vector.
> df
A B
1 A1 -1, 0, 1
2 A2 1, 2, 3
3 A3 2, 3, 4
For the final step, unnest the vectors in B using the tidyr function, unnest. This will automatically repeat A column values in rows as needed.
> df <- unnest(df)
> df
A B
1 A1 -1
2 A1 0
3 A1 1
4 A2 1
5 A2 2
6 A2 3
7 A3 2
8 A3 3
9 A3 4
Upvotes: 1