BJSP
BJSP

Reputation: 67

Network graph R - joining

Looking for some help on joining in order to make a forceNetwork() graph using . I just can't figure out what's wrong with the code below as I'm getting the following error/warning message.

I used this code before and I got it to work back then - just not sure what's different this time as I feel the input file is the same.

Warning messages:
1: Column `src`/`name` joining factors with different levels, coercing to character vector 
2: Column `target`/`name` joining factors with different levels, coercing to character vector 



# Load package
library(networkD3)
library(dplyr) 

# Create  data
src <- c(all_artists$from)
target <- c(all_artists$to)
networkData <- data.frame(src, target, stringsAsFactors = TRUE)
networkData

nodes <- data.frame(name = unique(c(src, target)), size = all_artists$related_artist_followers, stringsAsFactors = TRUE)
nodes$id <- 0:(nrow(nodes) - 1)

nodes

width <- c(all_artists$related_artist_followers)
width

# create a data frame of the edges that uses id 0:9 instead of their names
edges <- networkData %>%
  left_join(nodes, by = c("src" = "name")) %>%
  select(-src) %>%
  rename(source = id) %>%
  left_join(nodes, by = c("target" = "name")) %>%
  select(-target) %>%
  rename(target = id)

The dataset shows the artists that are related to each other - from is the nodes and to is the edges.

 from         to              artist_popularity
   Jay-Z        Kanye West      80
   Jay-Z        P. Diddy       60
   Kanye West   Kid Cudi       40

Upvotes: 0

Views: 178

Answers (1)

CJ Yetman
CJ Yetman

Reputation: 8848

The line where you build the nodes data frame seems unlikely to work as expected because there's no connection between the length of unique(c(src, target)) and all_artists$related_artist_followers. You could count the number of times a node/name appears in the networkData$src or all_artists$from column with...

nodes$size <- sapply(nodes$name, function(name) sum(networkData$src %in% name))

Once you have the nodes data frame created, it's easy to convert the names in the networkData data frame to zero-indexed indices with...

networkData$src <- match(networkData$src, nodes$name) - 1
networkData$target <- match(networkData$target, nodes$name) - 1

Note that it is also mandatory to provide a Value parameter for the Links data frame and a Group parameter for the Nodes data frame (any parameter that does not have a default value in the help file is mandatory, otherwise you might get an error or unexpected behavior... that goes for all R functions, not just ). You can create columns in your data frames for them like this...

networkData$value <- 1
nodes$group <- 1

So all together in a reproducible example, you might have...

from <- c("Jay-Z", "Jay-Z", "Kanye West")
to <- c("Kanye West", "P. Diddy", "Kid Cudi")
artist_popularity <- c(80, 60, 40)
all_artists <- data.frame(from, to, artist_popularity, stringsAsFactors = FALSE)


networkData <- data.frame(src = all_artists$from, target = all_artists$to, 
                          stringsAsFactors = FALSE)

nodes <- data.frame(name = unique(c(networkData$src, networkData$target)), 
                    stringsAsFactors = FALSE)

nodes$size <- sapply(nodes$name, function(name) sum(networkData$src %in% name))

networkData$src <- match(networkData$src, nodes$name) - 1
networkData$target <- match(networkData$target, nodes$name) - 1

networkData$value <- 1
nodes$group <- 1

library(networkD3)

forceNetwork(Links = networkData, Nodes = nodes, Source = "src", 
             Target = "target", Value = "value", NodeID = "name", 
             Nodesize = "size", Group = "group", opacityNoHover = 1)

enter image description here

Upvotes: 1

Related Questions