akh22
akh22

Reputation: 701

R: Replace part of the string in one column with a string in another column in data.table

I have the following data.table with two columns;

library(data.table)
dt1 <- as.data.table(data.frame(
    relative.1 = c("up", "down", "up", "down", "down", 
      "up", "up", "up", "down", "down"), 
    color.1 = c(
        "<span style=     color: red !important; >0.00239377213823793</span>", 
        "<span style=     color: red !important; >0.0189475913373258</span>", 
        "<span style=     color: red !important; >0.000944874682014027</span>", 
        "<span style=     color: red !important; >0.00115563834695583</span>", 
        "<span style=     color: red !important; >0.00190146895689528</span>", 
        "<span style=     color: red !important; >0.00905363339565874</span>", 
        "<span style=     color: red !important; >0.00786719465124788</span>", 
        "<span style=     color: red !important; >0.0021806607355806</span>", 
        "<span style=     color: black !important; >0.0677967189492317</span>", 
        "<span style=     color: black !important; >0.0643565809998716</span>"
    ), stringsAsFactors = FALSE))

I would like to replace numeric characters within ">" and "<" with a string in corresponding row of the column, "relative.1". For example, in the first row, I'd like to replace "0.00239377213823793" with "up".

I'd appreciate any pointers.

Upvotes: 1

Views: 286

Answers (2)

s_baldur
s_baldur

Reputation: 33498

dt1[, color.1 := sub('(?<=>)[0-9]+\\.[0-9]+(?=<)', relative.1, color.1, perl = TRUE), by = relative.1]

#     relative.1                                                color.1
#  1:         up     <span style=     color: red !important; >up</span>
#  2:       down   <span style=     color: red !important; >down</span>
#  3:         up     <span style=     color: red !important; >up</span>
#  4:       down   <span style=     color: red !important; >down</span>
#  5:       down   <span style=     color: red !important; >down</span>
#  6:         up     <span style=     color: red !important; >up</span>
#  7:         up     <span style=     color: red !important; >up</span>
#  8:         up     <span style=     color: red !important; >up</span>
#  9:       down <span style=     color: black !important; >down</span>
# 10:       down <span style=     color: black !important; >down</span>

Upvotes: 0

CSJCampbell
CSJCampbell

Reputation: 2115

The data.table package uses an update in place operator := to allow you to efficiently update columns. You can refer to other columns within the scope of the data.table. There are various ways of making edits to strings, and while regular expressions are not suitable for parsing HTML, the following pattern works for the example you have here.

dt1[, color.1 := stringr::str_replace(
    string = color.1, 
    pattern = "[0-9.]+", 
    replacement = relative.1)]
dt1
# relative.1                                                color.1
#  1:         up     <span style=     color: red !important; >up</span>
#  2:       down   <span style=     color: red !important; >down</span>
#  3:         up     <span style=     color: red !important; >up</span>
#  4:       down   <span style=     color: red !important; >down</span>
#  5:       down   <span style=     color: red !important; >down</span>
#  6:         up     <span style=     color: red !important; >up</span>
#  7:         up     <span style=     color: red !important; >up</span>
#  8:         up     <span style=     color: red !important; >up</span>
#  9:       down <span style=     color: black !important; >down</span>
# 10:       down <span style=     color: black !important; >down</span>

Upvotes: 1

Related Questions