Filipe Dias
Filipe Dias

Reputation: 284

Use str_detect() to extract information from a column and then create a new column

I'm working with a data.frame that contains a column whose values are named like this: D1_open, D9_shurb, D10_open, etc

enter image description here

I would like to create a new column whose values are just "open" or "shurb". That is, I would like to extract the words "open" and "shrub" from "ID_SubPlot" and put them on a new column. I believe str_detect() can be useful, but I can't figure out how.

Example data:

test <- structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746, 
618, 910, 898), ID_SubPlot = c("D1_open", "D9_shrub", "D8_open", 
"E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub", 
"S9_shrub", "S9_shrub")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Upvotes: 1

Views: 728

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

This could also help you. I assumed you would like to remove the ID part plus the underscore:

library(dplyr)
library(stringr)

test %>%
  mutate(result = str_remove(ID_SubPlot, "^[A-Za-z]\\d+(_)"))

# A tibble: 10 x 3
   ID_Plant ID_SubPlot result
      <dbl> <chr>      <chr> 
 1      243 D1_open    open  
 2      370 D9_shrub   shrub 
 3      789 D8_open    open  
 4      143 E4_shrub   shrub 
 5      559 U5_shrub   shrub 
 6      588 U10_open   open  
 7      746 S10_shrub  shrub 
 8      618 U10_shrub  shrub 
 9      910 S9_shrub   shrub 
10      898 S9_shrub   shrub 

Upvotes: 1

Ben
Ben

Reputation: 30474

Here is one approach using separate from tidyr:

library(tidyr)

separate(test, ID_SubPlot, into = c("Code", "NewCol"), sep = "_")

Output

   ID_Plant Code NewCol
1       243   D1   open
2       370   D9  shrub
3       789   D8   open
4       143   E4  shrub
5       559   U5  shrub
6       588  U10   open
7       746  S10  shrub
8       618  U10  shrub
9       910   S9  shrub
10      898   S9  shrub

Upvotes: 2

Ben373
Ben373

Reputation: 971

Regex (see also regex cheatsheet for R)

Simply use ".*_(.*)" to capture everything after _ in the first group and replace every string by the first captured group.

test$col = gsub(".*_(.*)", "\\1", test$ID_SubPlot)
test
   ID_Plant ID_SubPlot   col
1       243    D1_open  open
2       370   D9_shrub shrub
3       789    D8_open  open
4       143   E4_shrub shrub
5       559   U5_shrub shrub
6       588   U10_open  open
7       746  S10_shrub shrub
8       618  U10_shrub shrub
9       910   S9_shrub shrub
10      898   S9_shrub shrub

Data

test=structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746, 618, 910, 898), 
ID_SubPlot = c("D1_open", "D9_shrub", "D8_open", "E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub", "S9_shrub", "S9_shrub")), 
row.names = c(NA, -10L), class = c("data.frame"))

Upvotes: 1

Related Questions