stats_noob
stats_noob

Reputation: 5935

R: Visualizing "Linked" Data

I am using R and am trying to re-create something like this picture:

I did some research and saw that the "ggforce" library (https://ggforce.data-imaginist.com/reference/geom_parallel_sets.html) in R allows the user to make similar styles of plots (plots using the "first name", "middle name" and "last name" - these plots show that the "first name" is really common, the "first name and the middle name" is a bit less common, and the "first name, middle name and last name" is much less common):

library(ggforce)
library(reshape2)

 name_data <- data.frame(
    
    "First_Name" = c("John", "John", "John", "John", "John", "John", "James", "James", "Adam", "Adam", "Henry"),
    "Middle_Name" = c("Claude", "Claude", "Claude", "Smith", "Smith", "Peters", "Stevens", "Stevens", "Ford", "Tom", "Frank"),
    "Last Name " = c("Tony", "Tony", "Frank", "Carson", "Phil", "Lewis", "Eric", "David", "Roberts", "Scott", "Xavier")
)

name_data$ID <- seq.int(nrow(name_data))
 
data <- reshape2::melt(name_data)
data <- gather_set_data(name_data)

ggplot(name_data, aes( id = value, split = First_Name, value = value)) +
  geom_parallel_sets(aes( alpha = 0.3, axis.width = 0.1) +
  geom_parallel_sets_axes(axis.width = 0.1) +
  geom_parallel_sets_labels(colour = 'white'))

But this returns the following error:

Error in FUN(X[[i]], ...) : object 'x' not found

Can someone please show me what am I doing wrong?

Thanks

Upvotes: 0

Views: 188

Answers (1)

njp
njp

Reputation: 698

The first argument in ggplot aes function is the x-axis variable. In the example you provided, that was x=survived (which was probably set earlier in the example). You need to specify an x-axis variable, in this case perhaps it is x=City? i.e. try:

ggplot(name_data, aes(x=City, id=ID, ...

EDIT: OK it looks like you first need to have a count of the different name combinations (this already existed in the titanic data example as the value column). You can do this with the aggregate function:

name_counts=aggregate(name_data$ID,
                      by=list(First_Name=name_data$First_Name,
                              Middle_Name=name_data$Middle_Name,
                              Last_Name=name_data$Last.Name.),
                      FUN=length)
names(name_counts)[4] = 'value'

this gives a count of each combination of first, middle and last names. At this point, run the gather_set_data function:

name_counts_gathered = gather_set_data(name_counts, 1:3)

Now, you can plot using ggplot and geom_parallel_sets:

ggplot(name_counts_gathered) +
    geom_parallel_sets(aes(x=x,id=id,split=y,value=value))

The gather_set_data function adds the id, x and y columns as required by the plotting function.

I'm not entirely sure how you want the plot to look but you can hopefully now play around with the plotting labels and options.

Upvotes: 2

Related Questions