Quantizer
Quantizer

Reputation: 297

Convert dataframe to tidy format in R

I have a data frame with a structure as follows

> ls.str(df)

attachments : 'data.frame': 1103947 obs. of  2 variables:
 $ media_keys:List of 1103947
 $ poll_ids  :List of 1103947
author_id :  chr [1:1103947] "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" "21572351" ...
conversation_id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
created_at :  chr [1:1103947] "2018-06-11T20:06:05.000Z" "2018-06-11T20:02:27.000Z" "2018-06-11T19:07:26.000Z" "2018-06-11T18:46:12.000Z" ...
entities : 'data.frame':    1103947 obs. of  5 variables:
 $ mentions   :List of 1103947
 $ annotations:List of 1103947
 $ hashtags   :List of 1103947
 $ urls       :List of 1103947
 $ cashtags   :List of 1103947
geo : 'data.frame': 1103947 obs. of  2 variables:
 $ place_id   : chr  NA NA NA NA ...
 $ coordinates:'data.frame':    1103947 obs. of  2 variables:
id :  chr [1:1103947] "1006266341341519872" "1006265425791987715" "1006251577747869696" "1006246236171722753" "1006246168991600642" ...
in_reply_to_user_id :  chr [1:1103947] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA ...

and I want to convert it to a tidy format. Is there a neat little function to do this, that I don't know of? Google hasn't been to much help. Thanks in advance!

By tidy format, I mean something like this:

#> # A tibble: 25 × 31
#>    tweet_id       user_username text  conversation_id author_id lang  created_at
#>    <chr>          <chr>         <chr> <chr>           <chr>     <chr> <chr>     
#>  1 1406007405180… Phardiga      "RT … 14060074051803… 58755490  de    2021-06-1…
#>  2 1405617386405… dorothee_goe… "RT … 14056173864058… 97759337… de    2021-06-1…
#>  3 1405616047990… dejools       "RT … 14056160479909… 13065071… de    2021-06-1…
#>  4 1405615055555… LenaOetzel    "RT … 14056150555557… 97897581… de    2021-06-1…
#>  5 1405613064968… jenniferhenk… "RT … 14056130649684… 114774406 de    2021-06-1…
#>  6 1405610724026… Tobias_Schul… "Ihr… 14056107240266… 47919307  de    2021-06-1…
#>  7 1405393033558… HTMIBerlin    "👩‍💻…  14053930335589… 94052353… und   2021-06-1…
#>  8 1404808751857… Tobias_Schul… ".@j… 14048087518576… 47919307  de    2021-06-1…
#>  9 1404440929881… ASattelmacher "Oka… 14044409298812… 11508518… de    2021-06-1…
#> 10 1404393457427… dr_john_aus_b "#Ic… 14043934574273… 30635588… und   2021-06-1…

Upvotes: 1

Views: 721

Answers (1)

akrun
akrun

Reputation: 887881

With tidyr, we could wrap with unpack (to unpack the data.frame columns into regular columns) and then with unnest to convert the list columns to regular columns

library(dplyr)
library(tidyr)
df %>% 
  unpack(where(is.data.frame)) %>%
  unnest(where(is.list))

-output

# A tibble: 3 × 6
  media_keys poll_ids author_id conversation_id mentions annotations
       <int>    <int>     <int>           <int>    <int>       <int>
1          1        4         1               1        1           4
2          2        5         2               2        2           5
3          3        6         3               3        3           6

data

df <- structure(list(attachments = structure(list(media_keys = structure(list(
    1L, 2L, 3L), class = "AsIs"), poll_ids = structure(list(4L, 
    5L, 6L), class = "AsIs")), class = "data.frame", row.names = c(NA, 
-3L)), author_id = 1:3, conversation_id = 1:3, entities = structure(list(
    mentions = structure(list(1L, 2L, 3L), class = "AsIs"), 
annotations = structure(list(
        4L, 5L, 6L), class = "AsIs")), 
class = "data.frame", row.names = c(NA, 
-3L))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

Upvotes: 1

Related Questions