user177196
user177196

Reputation: 728

Concatenate two strings with common elements

I am working on a simple problem in R (but I have not yet figured it out though;p):

Given a vector vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", ..., "Amada + Steven", "Steven + Henry"). I want to create a new vector vect2 that contains all the elements in vect1 and new elements that share the following property: for every two strings "A+B" and "B+C", we concatenate it into "A+C" and add this new element into vect2. Can anyone please help me do this?

Also, I want to get all the elements standing in front of + in each string, is the following code correct?

for (i in length(vect1)){ vect3[i] <- regexpr(".*+", vect1[i]) }

3rd question: if I have a dataframe d with a Date column in the format %d-%b (for example, 01-Apr), how do I order this dataframe in an increasing order based on Date?? Let's just say d <- c(01-Apr,01-Mar,02-Jan,31-June,30-May).

Upvotes: 0

Views: 119

Answers (2)

ClementWalter
ClementWalter

Reputation: 5272

I think you could (should) avoid both for loops and the use of external lib if not required.

So this might be a solution:

// create data
vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", "Amada + Steven", "Steven + Henry")

// create a matrix of pairs with removed white spaces
pairsMatrix <- do.call(rbind, sapply(vect1, function(v) strsplit(gsub(pattern = " ", replacement = "", x = v), "\\+")))

// remove dimnames (not necessary though)
dimnames(pairsMatrix) <- NULL

// for all line of the pairsMatrix, find if second element is somewhere else first element. Bind that with the previous pairs
allPairs <- do.call(rbind, c(list(pairsMatrix), apply(pairsMatrix, 1, function(names) c(names[1], pairsMatrix[names[2]==pairsMatrix[,1], 2]))))

// filter for oneself-relationships
allPairs[allPairs[,1]!=allPairs[,2],]

      [,1]     [,2]    
 [1,] "Andy"   "Pete"  
 [2,] "Mary"   "Pete"  
 [3,] "Pete"   "Amada" 
 [4,] "Amada"  "Steven"
 [5,] "Steven" "Henry" 
 [6,] "Andy"   "Amada" 
 [7,] "Mary"   "Amada" 
 [8,] "Pete"   "Steven"
 [9,] "Amada"  "Henry" 

Concerning your last point, I think a simple sort with proper Date object will do it.

Upvotes: 1

Balter
Balter

Reputation: 1095

I think this should do it but I did things I probably shouldn't do... like growing objects and nesting for loops. If you want to access all elements in front of the '+', just use name.matrix[,1].

vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada","Amada + Steven", "Steven + Henry")

library(stringr)

name.matrix <- matrix(do.call('rbind',str_split(vect1, pattern = "\\s?[+]\\s?")), ncol = 2)

new.stuff <- c()

for(x in unique(name.matrix[,2])){
  sub.mat.1 <- matrix(name.matrix[name.matrix[,2] == x,], ncol = 2)
  sub.mat.2 <- matrix(name.matrix[name.matrix[,1] == x,], ncol = 2)
  if(length(sub.mat.1) && length(sub.mat.2)){
    for(y in seq_along(sub.mat.1[,2])){
      new.add <- paste0(sub.mat.1[y,1],'+', sub.mat.2[,2])
      new.stuff <- c(new.stuff, new.add)
    }
  }
}

vect2 <- c(vect1, new.stuff)
vect2
#[1] "Andy+Pete"      "Mary + Pete"    "Pete+ Amada"    "Amada + Steven" "Steven + Henry" "Andy+Amada"    
#[7] "Mary+Amada"     "Pete+Steven"    "Amada+Henry" 

Update:

Third question. Well there's only 30 days in June. So you're going to get an NA there. If it's a data.frame that you're trying to sort based on date, you'll need to use the format df[order(df$Date),]. The lubridate package also might be helpful when working with dates.

d <- c('01-Apr','01-Mar','02-Jan','31-June','30-May')

d.new <- as.Date(d, format = '%d-%b')
d.new <- d.new[order(d.new)]
d.new
#[1] "2018-01-02" "2018-03-01" "2018-04-01" "2018-05-30" NA  

Upvotes: 1

Related Questions