johnny
johnny

Reputation: 473

Rearrange words in strings depending on conditions

I have this data:

df<- data.frame("position" = c("ante", "ex", "post", "post ante pre", "post pre", "ante post pre", "ex pre", "ante pre")) 

Now I want to move the word "pre" so that it's the first word in the string, but only for the strings containing two words and the word "pre", so row numbers 1, 2, 3, 4 and 6 should not be affected.

This should be the result:

df <- data.frame("position" = c("ante", "ex", "post", "post ante pre", "pre post", "ante post pre", "pre ex", "pre ante"))

I guess I can start by writing a grepl statement to only select the rows containing the word "pre" but after that I'm a bit lost.

Upvotes: 1

Views: 126

Answers (3)

Chris
Chris

Reputation: 3986

You can use regex for this:

First I edited your example so that the starting and desired results are different (assuming this is your desired result here based on what you wrote)

library(dplyr)
library(stringr)


df <- data.frame("position" = c("ante", "ex", "post", "post pre ante", "post pre", "ante post pre", "ex pre", "pre ante")) 


df
#>        position
#> 1          ante
#> 2            ex
#> 3          post
#> 4 post pre ante
#> 5      post pre
#> 6 ante post pre
#> 7        ex pre
#> 8      pre ante
df2 <- data.frame("position" = c("ante", "ex", "post", "post pre ante", "pre post", "ante post pre", "pre ex", "pre ante"))
df2
#>        position
#> 1          ante
#> 2            ex
#> 3          post
#> 4 post pre ante
#> 5      pre post
#> 6 ante post pre
#> 7        pre ex
#> 8      pre ante


Then using regex:

df3 <- df %>%
  mutate(position = str_replace(position,'^([^\\s]+) {1}(?=pre$)(pre)','\\2 \\1'))

df3
#>        position
#> 1          ante
#> 2            ex
#> 3          post
#> 4 post pre ante
#> 5      pre post
#> 6 ante post pre
#> 7        pre ex
#> 8      pre ante

identical(df2, df3)
#> [1] TRUE

Slight edit: I think the lookahead was unnecessary so we can reduce this to:

df3 <- df %>%
  mutate(position = str_replace(position,'^([^\\s]+) {1}(pre)$','\\2 \\1'))

Upvotes: 3

Donald Seinen
Donald Seinen

Reputation: 4419

A slight change in the original data, switched the string at position 7 to "ex pre", attempt to change into "pre ex". One could use the stringr package and a for loop

df <- data.frame("position" = c("ante", "ex", "post", "post ante pre", "pre post", "ante post pre", "ex pre", "pre ante")) 

we want to change only position 7,

library(stringr)
    for (i in 1:nrow(df)) {
        if (sapply(strsplit(df[i,], " "), length) == 2 & str_split(df[i,], " ")[[1]][2] == "pre") {
            df[i,] <- str_flatten(unlist(str_split(df[i, ], " "))[2:1], collapse = " ")
    }
}

gives

position
1          ante
2            ex
3          post
4 post ante pre
5      pre post
6 ante post pre
7        pre ex
8      pre ante

A brief explanation of the loop, "for all rows (strings) in the df, split the string. if the length of this new string is 2, return TRUE. Then, split the words again (the result of str_split is a list), compare the 2nd element of the list to the word "pre", returning TRUE or FALSE. If both conditions are true, then change the order of the string to be element 2,and then element 1.

Note: To check and split twice is most likely not an optimal solution if you, for example want to apply it to a very large dataframe.

Upvotes: 1

bouncyball
bouncyball

Reputation: 10761

I would use a for loop to do this. First, split the string by spaces, and then do a few logical checks to see if changes need to be made:

newtext <- df$position

for(i in 1:length(newtext)){
  
  split_x <- el(strsplit(newtext[i], split = " "))
  
  if(length(split_x) == 2){
    if("pre" %in% split_x){
      newtext[i] <- paste("pre",
                          setdiff(split_x, "pre"))
    }
  }
  
}

Upvotes: 2

Related Questions