Blurbz
Blurbz

Reputation: 13

Unnest() R does not work for large data sample

I am unnesting data from a JSON file. When I make a small sample, the unnest() function works, but when I try to run it on the large, original dataframe I get the error below.

`Error in bind_rows_(x, .id) : 
  Column lines can't be converted from integer to list`

My code below. We got JSON data from GitHub's API.

`repo_data <- fromJSON("data/data/repos.json")`

Small data frames, only first 100 rows

`repo_small <- head(repo_data, 100)`

tidy repo data, unlist languages and lines of code

`df_repo <- repo_small %>% select(ownerName, name, languages, ownerType) %>% unnest()`

There were no NA rows when I filtered or any other strange things. The only column I need to unnest is languages.

Languages is a list that contains 2 lists. The first list is name and has values like "Java", "Python", and "Ruby". These are character values. The second list is lines and has values like 104, 109432, and 10. These are integer values.

As requested some sample code to replicate the data. testdf would be the data frame and language the column in question.

`owner <- c("github", "palentir", "apple")
gitcode <- data.frame(name = c("java"), lines=c(81))
palentircode <- data.frame(name= c("java", "python", "R"), lines=c(200, 45,903))
applecode <- data.frame(name=c("java", "ruby"), lines=c(12, 120))
langauge <- list(gitcode, palentircode, applecode)
testdf <- data.frame(owner)
testdf$language <- langauge`

dataframe with languages

Upvotes: 1

Views: 1837

Answers (1)

De Novo
De Novo

Reputation: 7640

From the documentation of unnest()

unnest() can handle list-columns that can atomic vectors, lists, or data frames (but not a mixture of the different types).

You have two different atomic types in your list. I don't know if this is the structure of your data or not, without a reproducible example as requested in the comments, but this illustrates the requirement of unnest()

DF <- data.frame(a = 1:2)
DF$name <- list(c("Java", "Python", "Ruby"), c(104L, 109432L, 10L))
unnest(DF, name)
# will fail because of the requirements of unnest

If this is the problem, you'll have to convert the second element of the list to character first.

D$name[[2]] <- as.character(DF$name[[2]])
unnest(DF, name)
#   a   name
# 1 1   Java
# 2 1 Python
# 3 1   Ruby
# 4 2    104
# 5 2 109432
# 6 2     10

Upvotes: 2

Related Questions