ip2018
ip2018

Reputation: 715

Replace text after underscore from rows of a column of a dataframe in dplyr pipe

An example df:

 a = c("a_1", "b_1", "b_2", "b_3", "c_1")
 b = c(1,2,3,4,5)
 df = cbind.data.frame(a,b)

How do I replace all text after, and including, _ using str_replace in dplyr pipe?

The following does not work:

df_1 = df %>% filter(b >= 1.5) %>% str_replace_all(df$a, "_*", "")

Upvotes: 1

Views: 507

Answers (1)

akuiper
akuiper

Reputation: 215057

You are missing the ., use _.* instead. Since _* matches zero or more underscores, notice * is a quantifier that specifies the quantity of direct afore character; while _.* matches an underscore and then everything after .* since . matches a general character;

df %>% mutate(new_a = str_replace(a, '_.*', as.character(b)))

#    a b new_a
#1 a_1 1    a1
#2 b_1 2    b2
#3 b_2 3    b3
#4 b_3 4    b4
#5 c_1 5    c5

Or if you just wanted to remove the trailing part:

df %>% mutate(new_a = str_replace(a, '_.*', ''))

#    a b new_a
#1 a_1 1     a
#2 b_1 2     b
#3 b_2 3     b
#4 b_3 4     b
#5 c_1 5     c

Upvotes: 2

Related Questions