Reputation: 473
I have this data:
df<- data.frame("position" = c("ante", "ex", "post", "post ante pre", "post pre", "ante post pre", "ex pre", "ante pre"))
Now I want to move the word "pre" so that it's the first word in the string, but only for the strings containing two words and the word "pre", so row numbers 1, 2, 3, 4 and 6 should not be affected.
This should be the result:
df <- data.frame("position" = c("ante", "ex", "post", "post ante pre", "pre post", "ante post pre", "pre ex", "pre ante"))
I guess I can start by writing a grepl statement to only select the rows containing the word "pre" but after that I'm a bit lost.
Upvotes: 1
Views: 126
Reputation: 3986
You can use regex for this:
First I edited your example so that the starting and desired results are different (assuming this is your desired result here based on what you wrote)
library(dplyr)
library(stringr)
df <- data.frame("position" = c("ante", "ex", "post", "post pre ante", "post pre", "ante post pre", "ex pre", "pre ante"))
df
#> position
#> 1 ante
#> 2 ex
#> 3 post
#> 4 post pre ante
#> 5 post pre
#> 6 ante post pre
#> 7 ex pre
#> 8 pre ante
df2 <- data.frame("position" = c("ante", "ex", "post", "post pre ante", "pre post", "ante post pre", "pre ex", "pre ante"))
df2
#> position
#> 1 ante
#> 2 ex
#> 3 post
#> 4 post pre ante
#> 5 pre post
#> 6 ante post pre
#> 7 pre ex
#> 8 pre ante
Then using regex:
df3 <- df %>%
mutate(position = str_replace(position,'^([^\\s]+) {1}(?=pre$)(pre)','\\2 \\1'))
df3
#> position
#> 1 ante
#> 2 ex
#> 3 post
#> 4 post pre ante
#> 5 pre post
#> 6 ante post pre
#> 7 pre ex
#> 8 pre ante
identical(df2, df3)
#> [1] TRUE
Slight edit: I think the lookahead was unnecessary so we can reduce this to:
df3 <- df %>%
mutate(position = str_replace(position,'^([^\\s]+) {1}(pre)$','\\2 \\1'))
Upvotes: 3
Reputation: 4419
A slight change in the original data, switched the string at position 7 to "ex pre", attempt to change into "pre ex". One could use the stringr
package and a for loop
df <- data.frame("position" = c("ante", "ex", "post", "post ante pre", "pre post", "ante post pre", "ex pre", "pre ante"))
we want to change only position 7,
library(stringr)
for (i in 1:nrow(df)) {
if (sapply(strsplit(df[i,], " "), length) == 2 & str_split(df[i,], " ")[[1]][2] == "pre") {
df[i,] <- str_flatten(unlist(str_split(df[i, ], " "))[2:1], collapse = " ")
}
}
gives
position
1 ante
2 ex
3 post
4 post ante pre
5 pre post
6 ante post pre
7 pre ex
8 pre ante
A brief explanation of the loop, "for all rows (strings) in the df,
split the string. if the length of this new string is 2, return TRUE. Then, split the words again (the result of str_split
is a list), compare the 2nd element of the list to the word "pre", returning TRUE or FALSE. If both conditions are true, then change the order of the string to be element 2,and then element 1.
Note: To check and split twice is most likely not an optimal solution if you, for example want to apply it to a very large dataframe.
Upvotes: 1
Reputation: 10761
I would use a for loop to do this. First, split the string by spaces, and then do a few logical checks to see if changes need to be made:
newtext <- df$position
for(i in 1:length(newtext)){
split_x <- el(strsplit(newtext[i], split = " "))
if(length(split_x) == 2){
if("pre" %in% split_x){
newtext[i] <- paste("pre",
setdiff(split_x, "pre"))
}
}
}
Upvotes: 2