Reputation: 1113
I would like to create a new data-frame with two new columns.
The dataframe would include "extended" v3
which is derived by breaking down the values separated by comma in v2
on the left and create two new columns, one with simply corresponding values from v1
which were in the same column, and second with values from v1
divided by the number of values that were included and separated by comma in corresponding row in the df
.
In other words, if like in this example
> df[1,]
v1 v2
1 1 1,3
Then the new df would have the following new entries in the first two rows:
v1 v2_split v3
1 1 1 0.5
2 1 3 0.5
Below is a reproducible example:
v1 <- c(1,5,3,7,9,3,2,5,NA,7)
v2 <- c("1,3","2","0.05,4,6,7","0",NA,"6","7","10,11","9","0.1")
df <- data.frame(v1,v2)
df$v2 <- as.character(df$v2)
v2_split <- as.numeric(unlist(strsplit(df$v2,",")))
Upvotes: 0
Views: 44
Reputation: 9656
Not sure if I got the question completely right but seems you want the following:
v2_split <- strsplit(df$v2,",")
df <- data.frame(v1 = rep(v1, lengths(v2_split)),
v2 = as.numeric(unlist(v2_split)),
v3 = rep(v1/lengths(v2_split), lengths(v2_split))
)
And the result:
> df
v1 v2 v3
1 1 1.00 0.50
2 1 3.00 0.50
3 5 2.00 5.00
4 3 0.05 0.75
5 3 4.00 0.75
6 3 6.00 0.75
7 3 7.00 0.75
8 7 0.00 7.00
9 9 NA 9.00
10 3 6.00 3.00
11 2 7.00 2.00
12 5 10.00 2.50
13 5 11.00 2.50
14 NA 9.00 NA
15 7 0.10 7.00
Upvotes: 1