Reputation: 173
Revised question with more accurate dataset example
I have several different lists, with each lists containing many characters. I've written up a very short example here
List1 <- "A + B + C + D + E:F + F:E"
List2<- "A + B + C + E:F + F:E + G:H + H:G"
List3 <- "J + K + L + L:H + L:H1"
I'm trying to find the frequency of occurrence through all of these lists but the duplicate of some items is causing problems.
Through a lot of loops, and X %in% Y, strsplit (splitting before and after ":"), I've gotten this
sig_var8
var count
1 0 0
2 A 2
3 B 2
4 C 2
5 D 1
6 E:F 2
7 F:E 2
8 G:H 1
9 H:G 1
10 J 1
11 K 1
12 L 1
13 L:H 1
14 L:H1 1
What I would like is this:
sig_var8
var count
1 0 0
2 A 2
3 B 2
4 C 2
5 D 1
6 E:F 2
7 G:H 1
8 J 1
9 K 1
10 L 1
11 L:H 1
12 L:H1 1
Note: in list 1, E:F and F:E is considered the same and only appears once. Same with list 2 where G:H == H:G, and only counted once. Note that grep isn't the best because L:H and L:H1 in list 3 are not the same, they need to be counted separately (hence the %in%).
Here's the code that I've worked on:
sig_var8<-data.frame(matrix(data=0,nrow=1,ncol=2))
colnames(sig_var8)<-c("var","count")
sig_var8[,1]<-as.character(sig_var8[,1])
sig_var8[,2]<-as.numeric(sig_var8[,2])
for(list in 1:3){
temp_list<-get(paste0("List",list)) #get the equation above
assign(paste0("List",list,"a"), gsub(" ","",temp_list)) #remove all spaces in the sentence
assign(paste0("List",list,"a_split"), strsplit(get(paste0("List",list,"a")),"[+]")) #split where "+" are
temp_listA<-get(paste0("List",list,"a_split"))[[1]]
for (item in 1:length(temp_listA)){
if(isTRUE(temp_listA[item] %in% sig_var8[,1])){
row_n<-which(sig_var8[,1]==temp_listA[item])
sig_var8[row_n,2]<-sig_var8[row_n,2]+1
}
if(isFALSE(temp_listA[item] %in% sig_var8[,1])){
row_n<-nrow(sig_var8)
sig_var8[row_n+1,1]<-temp_listA[item]
sig_var8[row_n+1,2]<-1
}
}
}
Upvotes: 2
Views: 61
Reputation: 76450
Maybe something like the following does what you want.
Lst <- mget(ls(pattern = "^List"))
Lst <- lapply(Lst, function(x) {
L <- strsplit(x, ":")
res <- sapply(L, function(y){
paste(sort(y), collapse = ":")
})
unique(res)
})
table(unlist(Lst))
#
# A B C D E:F G:H H:L H1:L J K L
# 2 2 2 1 2 1 1 1 1 1 1
Upvotes: 3
Reputation: 5138
I am not 100% sure this is what you are looking for, but if it is I will annotate it.
List1 <- c("A","B","C","D","E:F","F:E")
List2<- c("A","B","C","E:F","F:E","G:H","H:G")
List3 <- c("J","K","L","L:H","L:H1")
Lst <- list(List1, List2, List3)
keep_me <- lapply(Lst, function(x) !duplicated(lapply(strsplit(x, ":", fixed = T), sort)))
Lst_cleaned <- unlist(Map(`[`, Lst, keep_me))
table(Lst_cleaned)
Lst_cleaned
A B C D E:F G:H J K L L:H L:H1
2 2 2 1 2 1 1 1 1 1 1
Edit: added explanation below. Let me know if anything is still unclear or if you run into more issues. I use List1
for the beginning to demonstrate what lapply
is doing for each list element. Also, as a side-note, breaking it down made me realize you do not need to use which
if you do not want to. You can use the logical vector in Map
to subset the elements of Lst
# Spliting the string on the colon and sorting the elements
lapply(strsplit(List1, ":", fixed = T), sort)
[[1]]
[1] "A"
[[2]]
[1] "B"
[[3]]
[1] "C"
[[4]]
[1] "D"
[[5]]
[1] "E" "F"
[[6]]
[1] "E" "F"
# Logical vector for the elements are NOT duplicated
!duplicated(lapply(strsplit(List1, ":", fixed = T), sort))
[1] TRUE TRUE TRUE TRUE TRUE FALSE
# Which gives the indices for TRUE's
which(!duplicated(lapply(strsplit(List1, ":", fixed = T), sort)))
[1] 1 2 3 4 5
# Now, all together: lapply is applying the above logic to
# each elemnt in Lst, it returns a list of the indices that are not
# duplicates for each vector
lapply(Lst, function(x) which(!duplicated(lapply(strsplit(x, ":", fixed = T), sort))))
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 1 2 3 4 6
[[3]]
[1] 1 2 3 4 5
keep_me <- lapply(Lst, function(x) which(!duplicated(lapply(strsplit(x, ":", fixed = T), sort))))
# Map subsets (`[`) Lst by the indices in keep_me, and unlist
# flattens the list (i.e., unlist makes it a vector)
Map(`[`, Lst, keep_me)
[[1]]
[1] "A" "B" "C" "D" "E:F"
[[2]]
[1] "A" "B" "C" "E:F" "G:H"
[[3]]
[1] "J" "K" "L" "L:H" "L:H1"
unlist(Map(`[`, Lst, keep_me))
[1] "A" "B" "C" "D" "E:F" "A" "B" "C" "E:F" "G:H" "J" "K" "L" "L:H" "L:H1"
Upvotes: 1
Reputation: 2022
Based on @Rui's answer I think this will do what you want
List1 <- c("A","B","C","D","E:F","F:E")
List2<- c("A","B","C","E:F","F:E","G:H","H:G")
List3 <- c("J","K","L","L:H","L:H1")
# make list of all objects starting with List
Lst <- mget(ls(pattern = "^List"))
# function to split, sort, and stitch the duplicates
split.sort <- function(x) {
ifelse(length(x) > 1, paste0(sort(x), collapse = ":"), x)
}
# apply function to each of the Lst lists and remove duplicates
Lst <- lapply(Lst, function(y) unique(sapply(strsplit(y, ":"), split.sort)))
# get frequency
table(unlist(Lst))
#>
#> A B C D E:F G:H H:L H1:L J K L
#> 2 2 2 1 2 1 1 1 1 1 1
Created on 2019-04-17 by the reprex package (v0.2.1)
Upvotes: 1