Reputation: 297
I have a data frame with a structure as follows
> ls.str(df)
attachments : 'data.frame': 1103947 obs. of 2 variables:
$ media_keys:List of 1103947
$ poll_ids :List of 1103947
author_id : chr [1:1103947] "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" ...
conversation_id : chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
created_at : chr [1:1103947] "2018-06-11T20:06:05.000Z" "2018-06-11T20:02:27.000Z" "2018-06-11T19:07:26.000Z" "2018-06-11T18:46:12.000Z" ...
entities : 'data.frame': 1103947 obs. of 5 variables:
$ mentions :List of 1103947
$ annotations:List of 1103947
$ hashtags :List of 1103947
$ urls :List of 1103947
$ cashtags :List of 1103947
geo : 'data.frame': 1103947 obs. of 2 variables:
$ place_id : chr NA NA NA NA ...
$ coordinates:'data.frame': 1103947 obs. of 2 variables:
id : chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
in_reply_to_user_id : chr [1:1103947] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...
and I want to convert it to a tidy format. Is there a neat little function to do this, that I don't know of? Google hasn't been to much help. Thanks in advance!
By tidy format, I mean something like this:
#> # A tibble: 25 × 31
#> tweet_id user_username text conversation_id author_id lang created_at
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1406007405180… Phardiga "RT … 14060074051803… 58755490 de 2021-06-1…
#> 2 1405617386405… dorothee_goe… "RT … 14056173864058… 97759337… de 2021-06-1…
#> 3 1405616047990… dejools "RT … 14056160479909… 13065071… de 2021-06-1…
#> 4 1405615055555… LenaOetzel "RT … 14056150555557… 97897581… de 2021-06-1…
#> 5 1405613064968… jenniferhenk… "RT … 14056130649684… 114774406 de 2021-06-1…
#> 6 1405610724026… Tobias_Schul… "Ihr… 14056107240266… 47919307 de 2021-06-1…
#> 7 1405393033558… HTMIBerlin "👩💻… 14053930335589… 94052353… und 2021-06-1…
#> 8 1404808751857… Tobias_Schul… ".@j… 14048087518576… 47919307 de 2021-06-1…
#> 9 1404440929881… ASattelmacher "Oka… 14044409298812… 11508518… de 2021-06-1…
#> 10 1404393457427… dr_john_aus_b "#Ic… 14043934574273… 30635588… und 2021-06-1…
Upvotes: 1
Views: 721
Reputation: 887881
With tidyr
, we could wrap with unpack
(to unpack the data.frame columns into regular columns) and then with unnest
to convert the list
columns to regular columns
library(dplyr)
library(tidyr)
df %>%
unpack(where(is.data.frame)) %>%
unnest(where(is.list))
-output
# A tibble: 3 × 6
media_keys poll_ids author_id conversation_id mentions annotations
<int> <int> <int> <int> <int> <int>
1 1 4 1 1 1 4
2 2 5 2 2 2 5
3 3 6 3 3 3 6
df <- structure(list(attachments = structure(list(media_keys = structure(list(
1L, 2L, 3L), class = "AsIs"), poll_ids = structure(list(4L,
5L, 6L), class = "AsIs")), class = "data.frame", row.names = c(NA,
-3L)), author_id = 1:3, conversation_id = 1:3, entities = structure(list(
mentions = structure(list(1L, 2L, 3L), class = "AsIs"),
annotations = structure(list(
4L, 5L, 6L), class = "AsIs")),
class = "data.frame", row.names = c(NA,
-3L))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
Upvotes: 1