Reputation: 3461
I have a dataframe, in which the columns represent species. The species affilation is encoded in the column name's suffix:
Ac_1234_AnyString
The string after the second underscore (_) represents the species affiliation. I want to plot some networks based on rank correlations, and i want to color the species according to their species affiliation, later when i create fruchtermann-rheingold graphs with library(qgraph). Ive done it previously by sorting the df by the name_suffix and then create vectors by manually counting them:
list.names <- c("SG01", "SG02")
list <- vector("list", length(list.names))
names(list) <- list.names
list$SG01 <- c(1:12)
list$SG02 <- c(13:25)
str(list)
List of 2
$ SG01 : int [1:12] 1 2 3 4 5 6 7 8 9 10 ...
$ SG02 : int [1:13] 13 14 15 16 17 18 19 20 21 22 ...
This was very tedious for the big datasets i am working with. Question is, how can i avoid the manual sorting and counting, and extract vectors (or a list) according to the suffix and the position in the dataframe. I know i can create a vector with the suffix information by
indx <- gsub(".*_", "", names(my_data))
str(indx)
chr [1:29]
"4" "6" "6" "6" "6" "6" "11" "6" "6" "6" "6" "6" "3" "18" "6" "6" "6" "5" "5"
"6" "3" "6" "3" "6" "NA" "6" "5" "4" "11"
Now i would need to create vectors with the position of all "4"s, "6"s and so on:
List of 7
$ 4: int[1:2] 1 28
$ 6: int[1:17] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
$ 11: int[1:2] 7 29
....
Thank you.
Upvotes: 2
Views: 63
Reputation: 24074
you can try:
sapply(unique(indx), function(x, vec) which(vec==x), vec=indx)
# $`4`
# [1] 1 28
# $`6`
# [1] 2 3 4 5 6 8 9 10 11 12 15 16 17 20 22 24 26
# $`11`
# [1] 7 29
# $`3`
# [1] 13 21 23
# $`18`
# [1] 14
# $`5`
# [1] 18 19 27
# $`NA`
# [1] 25
Upvotes: 5
Reputation: 887048
Another option is
setNames(split(seq_along(indx),match(indx, unique(indx))), unique(indx))
Upvotes: 2