Reputation: 111
I have a dataframe with nested lists in the columns. Since it has been read in from a csv, the columns are formatted as characters.
Dummy data for reproduction:
col1 <- c("list1",
"list2",
"emptylist",
"list3")
col2 <-c("[{'test': 1, 'test2': 2, 'test3': 'string'}, {'test': 1, 'test2': 2, 'test3': 'string2'}]",
"[{'test3': 5, 'test4': 6, 'test5': 'string3'}, {'test': 7, 'test6': 8, 'test7': 'string4'}]",
"",
"[{'test9': 9, 'test10': 10, 'test11': 'string11'}, {'test12': 12, 'test13': 13, 'test14': 'string14'}]")
mydf <- data.frame(col1, col2)
In order to use something like mydf %>% dplyr::unnest(col2)
, I first need to format the column from character to list. Is this possible, if not, what alternative solution might be used?
Upvotes: 1
Views: 241
Reputation: 886938
It may be easier to do with fromJSON
from jsonlite
library(jsonlite)
lapply(gsub("'", '"', mydf$col2), fromJSON)
[[1]]
test test2 test3
1 1 2 string
2 1 2 string2
[[2]]
test3 test4 test5 test test6 test7
1 5 6 string3 NA NA <NA>
2 NA NA <NA> 7 8 string4
With the new dataset, just subset the ones that are not blank with either !=
or nzchar
i1 <- nzchar(mydf$col2)
mydf$col3[i1] <- lapply(gsub("'", '"', mydf$col2[i1]), fromJSON)
-output
> mydf$col3
[[1]]
test test2 test3
1 1 2 string
2 1 2 string2
[[2]]
test3 test4 test5 test test6 test7
1 5 6 string3 NA NA <NA>
2 NA NA <NA> 7 8 string4
[[3]]
NULL
[[4]]
test9 test10 test11 test12 test13 test14
1 9 10 string11 NA NA <NA>
2 NA NA <NA> 12 13 string14
Upvotes: 2