Reputation: 728
I am working on a simple problem in R (but I have not yet figured it out though;p):
Given a vector vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", ..., "Amada + Steven", "Steven + Henry")
. I want to create a new vector vect2
that contains all the elements in vect1
and new elements that share the following property: for every two strings "A+B"
and "B+C"
, we concatenate it into "A+C"
and add this new element into vect2
. Can anyone please help me do this?
Also, I want to get all the elements standing in front of +
in each string, is the following code correct?
for (i in length(vect1)){
vect3[i] <- regexpr(".*+", vect1[i])
}
3rd question: if I have a dataframe d
with a Date
column in the format %d-%b
(for example, 01-Apr
), how do I order this dataframe in an increasing order based on Date
?? Let's just say d <- c(
01-Apr,
01-Mar,
02-Jan,
31-June,
30-May)
.
Upvotes: 0
Views: 119
Reputation: 5272
I think you could (should) avoid both for
loops and the use of external lib if not required.
So this might be a solution:
// create data
vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", "Amada + Steven", "Steven + Henry")
// create a matrix of pairs with removed white spaces
pairsMatrix <- do.call(rbind, sapply(vect1, function(v) strsplit(gsub(pattern = " ", replacement = "", x = v), "\\+")))
// remove dimnames (not necessary though)
dimnames(pairsMatrix) <- NULL
// for all line of the pairsMatrix, find if second element is somewhere else first element. Bind that with the previous pairs
allPairs <- do.call(rbind, c(list(pairsMatrix), apply(pairsMatrix, 1, function(names) c(names[1], pairsMatrix[names[2]==pairsMatrix[,1], 2]))))
// filter for oneself-relationships
allPairs[allPairs[,1]!=allPairs[,2],]
[,1] [,2]
[1,] "Andy" "Pete"
[2,] "Mary" "Pete"
[3,] "Pete" "Amada"
[4,] "Amada" "Steven"
[5,] "Steven" "Henry"
[6,] "Andy" "Amada"
[7,] "Mary" "Amada"
[8,] "Pete" "Steven"
[9,] "Amada" "Henry"
Concerning your last point, I think a simple sort with proper Date
object will do it.
Upvotes: 1
Reputation: 1095
I think this should do it but I did things I probably shouldn't do... like growing objects and nesting for
loops. If you want to access all elements in front of the '+', just use name.matrix[,1]
.
vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada","Amada + Steven", "Steven + Henry")
library(stringr)
name.matrix <- matrix(do.call('rbind',str_split(vect1, pattern = "\\s?[+]\\s?")), ncol = 2)
new.stuff <- c()
for(x in unique(name.matrix[,2])){
sub.mat.1 <- matrix(name.matrix[name.matrix[,2] == x,], ncol = 2)
sub.mat.2 <- matrix(name.matrix[name.matrix[,1] == x,], ncol = 2)
if(length(sub.mat.1) && length(sub.mat.2)){
for(y in seq_along(sub.mat.1[,2])){
new.add <- paste0(sub.mat.1[y,1],'+', sub.mat.2[,2])
new.stuff <- c(new.stuff, new.add)
}
}
}
vect2 <- c(vect1, new.stuff)
vect2
#[1] "Andy+Pete" "Mary + Pete" "Pete+ Amada" "Amada + Steven" "Steven + Henry" "Andy+Amada"
#[7] "Mary+Amada" "Pete+Steven" "Amada+Henry"
Update:
Third question. Well there's only 30 days in June. So you're going to get an NA there. If it's a data.frame that you're trying to sort based on date, you'll need to use the format df[order(df$Date),]
. The lubridate
package also might be helpful when working with dates.
d <- c('01-Apr','01-Mar','02-Jan','31-June','30-May')
d.new <- as.Date(d, format = '%d-%b')
d.new <- d.new[order(d.new)]
d.new
#[1] "2018-01-02" "2018-03-01" "2018-04-01" "2018-05-30" NA
Upvotes: 1