Alan20
Alan20

Reputation: 281

r: Split a row with several items into one column by an id

I have data where one row has several items in it.

My actual dataframe is much larger, but here is an example dataframe to illustrate my problem:

shapeId verticeCoordinates
      3 [0,0][0,1][1,1][1,0]
      7 [0,0][2,0][2,1]
     10 [0,0][1,0][0,1][2,2][2,3]

I want there to only be one set of vertice coordinates in each row and to have its corresponding shapeId.

I would like the data to be in the following format:

shapeId verticeCoordinates
      3 [0,0]
      3 [0,1]
      3 [1,1]
      3 [1,0]

Reproducible example data:

structure(list(shapeId = c(3L, 7L, 10L), verticeCoordinates = c("[0,0][0,1][1,1][1,0]", 
"[0,0][2,0][2,1]", "[0,0][1,0][0,1][2,2][2,3]")), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Views: 210

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101099

A data.table option

setDT(df)[
  ,
  .(verticeCoordinates = unlist(
    strsplit(verticeCoordinates, "(?<=\\])", perl = TRUE)
  )),
  shapeId
]

gives

    shapeId verticeCoordinates
 1:       3              [0,0]
 2:       3              [0,1]
 3:       3              [1,1]
 4:       3              [1,0]
 5:       7              [0,0]
 6:       7              [2,0]
 7:       7              [2,1]
 8:      10              [0,0]
 9:      10              [1,0]
10:      10              [0,1]
11:      10              [2,2]
12:      10              [2,3]

A base R option

with(
  df,
  setNames(
    rev(
      stack(
        setNames(
          strsplit(verticeCoordinates, "(?<=\\])", perl = TRUE),
          shapeId
        )
      )
    ),
    names(df)
  )
)

gives

   shapeId verticeCoordinates
1        3              [0,0]
2        3              [0,1]
3        3              [1,1]
4        3              [1,0]
5        7              [0,0]
6        7              [2,0]
7        7              [2,1]
8       10              [0,0]
9       10              [1,0]
10      10              [0,1]
11      10              [2,2]
12      10              [2,3]

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can split the data on every opening square bracket ([) and create new rows.

tidyr::separate_rows(df, verticeCoordinates, sep = '(?=\\[)') %>%
  dplyr::filter(verticeCoordinates != '')

#   shapeId verticeCoordinates
#     <int> <chr>             
# 1       3 [0,0]             
# 2       3 [0,1]             
# 3       3 [1,1]             
# 4       3 [1,0]             
# 5       7 [0,0]             
# 6       7 [2,0]             
# 7       7 [2,1]             
# 8      10 [0,0]             
# 9      10 [1,0]             
#10      10 [0,1]             
#11      10 [2,2]             
#12      10 [2,3]             

Upvotes: 2

Related Questions