Reputation: 281
I have data where one row has several items in it.
My actual dataframe is much larger, but here is an example dataframe to illustrate my problem:
shapeId verticeCoordinates
3 [0,0][0,1][1,1][1,0]
7 [0,0][2,0][2,1]
10 [0,0][1,0][0,1][2,2][2,3]
I want there to only be one set of vertice coordinates in each row and to have its corresponding shapeId.
I would like the data to be in the following format:
shapeId verticeCoordinates
3 [0,0]
3 [0,1]
3 [1,1]
3 [1,0]
Reproducible example data:
structure(list(shapeId = c(3L, 7L, 10L), verticeCoordinates = c("[0,0][0,1][1,1][1,0]",
"[0,0][2,0][2,1]", "[0,0][1,0][0,1][2,2][2,3]")), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 1
Views: 210
Reputation: 101099
A data.table
option
setDT(df)[
,
.(verticeCoordinates = unlist(
strsplit(verticeCoordinates, "(?<=\\])", perl = TRUE)
)),
shapeId
]
gives
shapeId verticeCoordinates
1: 3 [0,0]
2: 3 [0,1]
3: 3 [1,1]
4: 3 [1,0]
5: 7 [0,0]
6: 7 [2,0]
7: 7 [2,1]
8: 10 [0,0]
9: 10 [1,0]
10: 10 [0,1]
11: 10 [2,2]
12: 10 [2,3]
A base R option
with(
df,
setNames(
rev(
stack(
setNames(
strsplit(verticeCoordinates, "(?<=\\])", perl = TRUE),
shapeId
)
)
),
names(df)
)
)
gives
shapeId verticeCoordinates
1 3 [0,0]
2 3 [0,1]
3 3 [1,1]
4 3 [1,0]
5 7 [0,0]
6 7 [2,0]
7 7 [2,1]
8 10 [0,0]
9 10 [1,0]
10 10 [0,1]
11 10 [2,2]
12 10 [2,3]
Upvotes: 1
Reputation: 388817
You can split the data on every opening square bracket ([
) and create new rows.
tidyr::separate_rows(df, verticeCoordinates, sep = '(?=\\[)') %>%
dplyr::filter(verticeCoordinates != '')
# shapeId verticeCoordinates
# <int> <chr>
# 1 3 [0,0]
# 2 3 [0,1]
# 3 3 [1,1]
# 4 3 [1,0]
# 5 7 [0,0]
# 6 7 [2,0]
# 7 7 [2,1]
# 8 10 [0,0]
# 9 10 [1,0]
#10 10 [0,1]
#11 10 [2,2]
#12 10 [2,3]
Upvotes: 2