substitute values in dataframe based on partial match

This is my data

> df1
        col1      col2
1  0/0:6:6,0 0/0:6:6,0
2  0/0:6:6,0 0/1:6:6,0
...
6  1/1:6:6,0 0/0:6:6,0
7  0/0:8:8,0 0/0:8:8,0

What I want is to substitute long entries like "0/0:6:6,0" with just 0 if they start with "0/0", 0.5 if they start with "0/1" etc.

So far I have tried this:

1) replace-starts_with

df %>% mutate(col1 = replace(col1, starts_with("0/0"), 0)) %>% head()
    Error in mutate_impl(.data, dots) : 
      Evaluation error: Variable context not set.
    In addition: Warning message:
    In `[<-.factor`(`*tmp*`, list, value = 0) :
      invalid factor level, NA generated

2) grep (seen this as a solution here)

df[,1][grep("0/1",df[,1])]<-0.5
Warning message:
In `[<-.factor`(`*tmp*`, grep("0/1", df[, 1]), value = c(NA, 2L,  :
  invalid factor level, NA generated

Feeling lost... it's been a long day

Upvotes: 1

Views: 164

Answers (1)

akrun
akrun

Reputation: 886948

We can use grepl

df1 %>%
   mutate(col1 = replace(col1, grepl("^0/0", col1), 0))
#       col1      col2
#1         0 0/0:6:6,0
#2         0 0/1:6:6,0
#3 1/1:6:6,0 0/0:6:6,0
#4         0 0/0:8:8,0

Or use startsWith from base R

df1 %>%
    mutate(col1 = replace(col1, startsWith(col1, "0/0"), 0))

The issue with dplyr::starts_with is that it is a helper function to select variables based on their names

df1 %>%
    select(starts_with('col1'))
#       col1
#1 0/0:6:6,0
#2 0/0:6:6,0
#6 1/1:6:6,0
#7 0/0:8:8,0

and not the values of the variables whereas startsWith returns a logical vector as grepl

startsWith(df1$col1, "0/0")
#[1]  TRUE  TRUE FALSE  TRUE

Upvotes: 2

Related Questions