MIH
MIH

Reputation: 1113

Creating new data.frame by splitting comma separated entries in old one

I would like to create a new data-frame with two new columns. The dataframe would include "extended" v3 which is derived by breaking down the values separated by comma in v2 on the left and create two new columns, one with simply corresponding values from v1 which were in the same column, and second with values from v1 divided by the number of values that were included and separated by comma in corresponding row in the df. In other words, if like in this example

> df[1,]
  v1  v2
1  1 1,3

Then the new df would have the following new entries in the first two rows:

  v1 v2_split  v3
1  1  1       0.5
2  1  3       0.5

Below is a reproducible example:

v1 <- c(1,5,3,7,9,3,2,5,NA,7)
v2 <- c("1,3","2","0.05,4,6,7","0",NA,"6","7","10,11","9","0.1")
df <- data.frame(v1,v2)
df$v2 <- as.character(df$v2)
v2_split <- as.numeric(unlist(strsplit(df$v2,",")))

Upvotes: 0

Views: 44

Answers (1)

Karolis Koncevičius
Karolis Koncevičius

Reputation: 9656

Not sure if I got the question completely right but seems you want the following:

v2_split <- strsplit(df$v2,",")

df <- data.frame(v1 = rep(v1, lengths(v2_split)),
                 v2 = as.numeric(unlist(v2_split)),
                 v3 = rep(v1/lengths(v2_split), lengths(v2_split))
                 )

And the result:

> df

   v1    v2   v3
1   1  1.00 0.50
2   1  3.00 0.50
3   5  2.00 5.00
4   3  0.05 0.75
5   3  4.00 0.75
6   3  6.00 0.75
7   3  7.00 0.75
8   7  0.00 7.00
9   9    NA 9.00
10  3  6.00 3.00
11  2  7.00 2.00
12  5 10.00 2.50
13  5 11.00 2.50
14 NA  9.00   NA
15  7  0.10 7.00

Upvotes: 1

Related Questions