Reputation: 5935
I am using R and am trying to re-create something like this picture:
I did some research and saw that the "ggforce" library (https://ggforce.data-imaginist.com/reference/geom_parallel_sets.html) in R allows the user to make similar styles of plots (plots using the "first name", "middle name" and "last name" - these plots show that the "first name" is really common, the "first name and the middle name" is a bit less common, and the "first name, middle name and last name" is much less common):
library(ggforce)
library(reshape2)
name_data <- data.frame(
"First_Name" = c("John", "John", "John", "John", "John", "John", "James", "James", "Adam", "Adam", "Henry"),
"Middle_Name" = c("Claude", "Claude", "Claude", "Smith", "Smith", "Peters", "Stevens", "Stevens", "Ford", "Tom", "Frank"),
"Last Name " = c("Tony", "Tony", "Frank", "Carson", "Phil", "Lewis", "Eric", "David", "Roberts", "Scott", "Xavier")
)
name_data$ID <- seq.int(nrow(name_data))
data <- reshape2::melt(name_data)
data <- gather_set_data(name_data)
ggplot(name_data, aes( id = value, split = First_Name, value = value)) +
geom_parallel_sets(aes( alpha = 0.3, axis.width = 0.1) +
geom_parallel_sets_axes(axis.width = 0.1) +
geom_parallel_sets_labels(colour = 'white'))
But this returns the following error:
Error in FUN(X[[i]], ...) : object 'x' not found
Can someone please show me what am I doing wrong?
Thanks
Upvotes: 0
Views: 188
Reputation: 698
The first argument in ggplot aes function is the x-axis variable. In the example you provided, that was x=survived
(which was probably set earlier in the example). You need to specify an x-axis variable, in this case perhaps it is x=City
? i.e. try:
ggplot(name_data, aes(x=City, id=ID, ...
EDIT:
OK it looks like you first need to have a count of the different name combinations (this already existed in the titanic data example as the value column). You can do this with the aggregate
function:
name_counts=aggregate(name_data$ID,
by=list(First_Name=name_data$First_Name,
Middle_Name=name_data$Middle_Name,
Last_Name=name_data$Last.Name.),
FUN=length)
names(name_counts)[4] = 'value'
this gives a count of each combination of first, middle and last names. At this point, run the gather_set_data
function:
name_counts_gathered = gather_set_data(name_counts, 1:3)
Now, you can plot using ggplot
and geom_parallel_sets
:
ggplot(name_counts_gathered) +
geom_parallel_sets(aes(x=x,id=id,split=y,value=value))
The gather_set_data
function adds the id, x and y columns as required by the plotting function.
I'm not entirely sure how you want the plot to look but you can hopefully now play around with the plotting labels and options.
Upvotes: 2