Reputation: 415
sorry if this is a rookie question and the long post. Thank you in advance. So I have a dataset of 88250 rows 131 columns, rows are observations and columns are labels and variables (column 1:21 are labels characters and 21:131 are variables doubles). I was trying to use UMAP from UWOT library to visualise and later perform supervised training. Now the first thing I tried to do is to tune the parameters for the UMAP model, namely n_neighbors and min_dist. UMAP output will be a table of X and Y coordinations and I can attach them onto my data frame then plot them. Here are the codes for one set of parameter chosen and I could plot a scatter plot and convert it to a 2D density plot to visualise differences in different treatments, hence the facet_wrap.
library(uwot)
#define real data and labels
df.labels = df[,1:21]
df.data = df[,22:131]
#apply UMAP transformation
df.umap<-umap(df.data,n_sgd_threads = 0,n_trees = 500,n_neighbors=50,
min_dist=0.2,pca=50,
verbose = T)
df$UMAPX<- df.umap[,1]
df$UMAPY<- df.umap[,2]
library(ggplot2)
m<-ggplot(df, aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = "UMAP visulisaiton")
#try 2d density plot and see some distribution
m +
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)
Now I want to write loops to store all the umap results into a list, each list is the data frame with the UMAP X and Y coordinates corresponding to a test pair value of the parameters. This worked and I got my list.
#attempt to perform grid search for hyperparameter tuning
#interate the grid, manually set
#performance evaluation
n_neighbors.test <-seq(1,100,20)
min_dist.test <- seq(0.05,4,0.5)
#creating a data frame containing all combinations of the grid
hyper_grid <- expand.grid(n_neighbors=n_neighbors.test, min_dist=min_dist.test)
#create an empty list to store the models
models <- list()
#excute the grid search
for (i in 1:nrow(hyper_grid)) {
# get value paris at row i
n_neighbors <- hyper_grid$n_neighbors[i]
min_dist <- hyper_grid$min_dist[i]
#train a model and store it in the list
models[[i]] <- umap(df.data,n_sgd_threads = 0,n_trees = 500)
}
#integrating the x, y parameters from umap grid search into a list of dataframes for later visualisation
para<-list()
for (i in 1:40) {
df$UMAPX<- models[[i]][,1]
df$UMAPY<- models[[i]][,2]
para[[i]]<- cbind(df,df$UMAPX,df$UMAPY)
}
here it got stuck I want to loop this ggplot code with each dataframe in the list using each of the x=UMAPX ,y=UMAPY Aim to generate 40 plots of the 15 panel facet wrap of the pairs of n_neighbors and min_dist tested. I thought I can modify the previous ggplot piece into a function and use map to apply it to all things in the list para then to plot but the plot list is NULL, no error returns. And the later PDF file is empty/.
library(purrr)
plot<- map(para,function(i){
for (i in 1:40) {
ggplot(para[[i]], aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = "UMAP visulisaiton for model")+
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)
}
})
pdf("plots.pdf")
for (i in 1:length(plot)) {
print(plot[[i]])
}
dev.off()
Upvotes: 0
Views: 586
Reputation: 1363
The answer to the original problem is in the comments. Replace para[[i]]
with i
.
To add a title to the plot:
One way would be to simultaneously map over para and the n_neighbors column of hyper_grid, and use that in the title. If I understand your code correctly, the following should work. Subsetting hyper_grid$n_neighbors with [1:40] may be unnecessary, if 40 is the total nrow of hyper_grid.
plot<- map2(para, hyper_grid$n_neighbors[1:40], function(param, n_neighbors){
ggplot(param, aes(x=UMAPX ,y=UMAPY))+
geom_point()+
scale_x_continuous(name = "UMAP_X-axis_coordinates")+
scale_y_continuous(name = "UMAP_y-axis_coordinates")+
theme(axis.text.x= element_blank())+
theme(axis.text.y = element_blank())+
theme(axis.line = element_line(colour = "black",
size = 0.1,
linetype = "solid"))+
labs(title = paste("UMAP visualization for model /w n_neighbors: ", n_neighbors))+
geom_density_2d()+
stat_density_2d(aes(fill=..level..), geom = "polygon")+
scale_fill_gradient(low = "blue", high = "red")+
facet_wrap(df.labels$treatmentsum~.)
})
Upvotes: 1