Reputation: 726
I have a set of alpha-numeric vectors:
lst <- list(c("三垣3-19", "6", "81497", "79992", "79101",
"77760", "75973", "75411", "74666"), c("蒼龍1-01", "2", "66249", "65474", "66803", "64238"), c("蒼龍1-02", "1", "64238"), "蒼龍1-03")
[[1]]
[1] "三垣3-19" "6" "81497" "79992"
[5] "79101" "77760" "75973" "75411"
[9] "74666"
[[2]]
[1] "蒼龍1-01" "2" "66249" "65474"
[5] "66803" "64238"
[[3]]
[1] "蒼龍1-02" "1" "64238"
[[4]]
[1] "蒼龍1-03"
The second number on each vector (i.e. 6,2,1) represents the total number of lines to be drawn to connect stars, given by their HIP number to the right, together. Each pair of HIP number indicates a line drawn between 2 stars.
Hence 81497 79992
in [[1]]
would mean "draw a line between star number "81497" and "79992", so on and so forth.
In the case of a continuous line, such as [[1]]
, the numbers between "81497" and "74666" should be repeated so that there is no break in the lines.
Thus, in the case of [[1]]
, "79992" "79101" "77760" "75973" "75411"
should be repeated to give the following result:
[[1]]
[1] "三垣3-19" "6" "81497" "79992"
[5] "79992" "79101" "79101" "77760"
[9] "77760" "75973" "75973" "75411"
[13] "75411" "74666"
[[2]]
[1] "蒼龍1-01" "2" "66249" "65474"
[5] "66803" "64238"
[[3]]
[1] "蒼龍1-02" "1" "64238" "64238"
[[4]]
[1] "蒼龍1-03"
Since the second element on each list represents the total number of lines to be drawn, a validity test can be coded to indicate whether certain numbers need to be repeated. Thus 6
in [[1]]
means there should be 6 pairs (i.e. 6 * 2 = 12 elements) of HIP numbers that follow. When the validity test fails, I would like R to repeat the numbers in between the third and final elements for me so that the continuous line can be drawn.
The partial solution I managed to cobble up is as follows:
lapply(lst, function(x) x[2]) == (lengths(lst)-2)/2
[1] FALSE TRUE FALSE NA
This tests the HIP values for its validity. Only [[2]]
fits into the description in the original list. [[1]]
and [[3]]
would be the vectors we need to work on.
To repeat individual values in-between a certain vector, I could do this:
> x <- c(1,2,3,4,5)
> x[2:4] <- lapply(x[2:4], function(x) rep(x, 2))
> unlist(x)
[1] 1 2 2 3 3 4 4 5
However, because lst
is a list, I cannot do:
lst[2:4] <- lapply(lst[2:4], function(x) rep(x, 2))
to get the same results. The fact that the end number (4, in this case) needs to be specified by lengths(lst)
further complicates the matter.
I suppose the final code would be an ifelse()
function to join the two functions described above.
Clarification of the rule:
The second element of each vector represents the desired number of distinct HIP pairs to draw a line.
[[2]]
is valid because there are 2 pairs of numbers that follow, which fits the value given in its second element, so the numbers need not be repeated.
In this case, the lines most probably form a cross, rather than a continuous line. So the rule should be applied only in the case of a continuous line, such as [[1]]
.
As for the case of [[3]], because there is only one point, the number is repeated as a rule, so that the validity given by the second element is sustained.
BUG INQUIRY
@TUSHAr: Your code seems to generate NA
values when elements within the vectors contain non-numeric values.
lst <- list(c("三垣3-19", "6", "81497", "79992A", "79101",
"77760", "75973A", "75411", "74666"), c("蒼龍1-01", "2", "66249", "65474", "66803B", "64238"), c("蒼龍1-02", "1", "64238"), "蒼龍1-03")
Run the code with the above data and you get:
[[1]]
[1] "三垣3-19" "6" "81497" NA NA
[6] "79101" "79101" "77760" "77760" NA
[11] NA "75411" "75411" "74666"
[[2]]
[1] "蒼龍1-01" "2" "66249" "65474" NA
[6] "64238"
[[3]]
[1] "蒼龍1-02" "1" "64238" "64238"
[[4]]
[1] "蒼龍1-03"
What is causing this, and is there a way to fix it?
Upvotes: 0
Views: 251
Reputation: 3116
Storing the first value of each vector
in lst
in a separate variable id
to avoid unnecessary subsetting during processing.
id = lapply(lst,function(t){t[1]})
Removed the first element which is already stored in id
.
lst = lapply(lst,function(t){
t=t[-1]
#if(length(t)>0){
# as.integer(t)
#}
})
Loop through the processed lst
object:
temp = lapply(lst,function(t){
#Use the first value as the desired number of pairs in `reqdpairs`
reqdpairs = as.numeric(t[1])
#remove the first values so that `t` only contains HIP numbers.
t=t[-1]
#calculate existing number of pairs for case [[2]] such that if all conditions are satisfied we don't do any processing
noofpairs = floor(length(t)/2)
#check if `t` contains values after removing the first element. The `else` part covers the case [[3]]
if(length(t)>1){
#If `noofpairs` is not equal to `reqdpairs` use `rep` on the inner elements (**excluding the first and last element**) of the vector.
if(noofpairs!=reqdpairs){
pairs=c(reqdpairs,t[1],rep(t[-c(1,length(t))],each=2),t[length(t)])
}else{
#In this case no processing is required so we just merge the reqdpairs with `t` as it is
pairs=c(reqdpairs,t)
}
}else if(length(t)==1){
pairs=rep(t[1],times=2)
pairs=c(reqdpairs,pairs)
}else{
pairs=NULL
}
pairs=as.character(pairs)
}
)
This step is to merge id
with temp
to achieve the desired output format. Basically just a concatenation step.
mapply(function(x,y){c(x,y)},id,temp)
#[[1]]
#[1] "三垣3-19" "6" "81497" "79992" "79992" "79101" "79101" "77760" "77760" "75973"
#[11] "75973" "75411" "75411" "74666"
#[[2]]
#[1] "蒼龍1-01" "2" "66249" "65474" "66803" "64238"
#[[3]]
#[1] "蒼龍1-02" "1" "64238" "64238"
#[[4]]
#[1] "蒼龍1-03"
Upvotes: 1