Reputation: 284
I'm working with a data.frame that contains a column whose values are named like this: D1_open, D9_shurb, D10_open, etc
I would like to create a new column whose values are just "open" or "shurb". That is, I would like to extract the words "open" and "shrub" from "ID_SubPlot" and put them on a new column. I believe str_detect() can be useful, but I can't figure out how.
Example data:
test <- structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746,
618, 910, 898), ID_SubPlot = c("D1_open", "D9_shrub", "D8_open",
"E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub",
"S9_shrub", "S9_shrub")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 1
Views: 728
Reputation: 21908
This could also help you. I assumed you would like to remove the ID part plus the underscore:
library(dplyr)
library(stringr)
test %>%
mutate(result = str_remove(ID_SubPlot, "^[A-Za-z]\\d+(_)"))
# A tibble: 10 x 3
ID_Plant ID_SubPlot result
<dbl> <chr> <chr>
1 243 D1_open open
2 370 D9_shrub shrub
3 789 D8_open open
4 143 E4_shrub shrub
5 559 U5_shrub shrub
6 588 U10_open open
7 746 S10_shrub shrub
8 618 U10_shrub shrub
9 910 S9_shrub shrub
10 898 S9_shrub shrub
Upvotes: 1
Reputation: 30474
Here is one approach using separate
from tidyr
:
library(tidyr)
separate(test, ID_SubPlot, into = c("Code", "NewCol"), sep = "_")
Output
ID_Plant Code NewCol
1 243 D1 open
2 370 D9 shrub
3 789 D8 open
4 143 E4 shrub
5 559 U5 shrub
6 588 U10 open
7 746 S10 shrub
8 618 U10 shrub
9 910 S9 shrub
10 898 S9 shrub
Upvotes: 2
Reputation: 971
Simply use ".*_(.*)"
to capture everything after _ in the first group and replace every string by the first captured group.
test$col = gsub(".*_(.*)", "\\1", test$ID_SubPlot)
test
ID_Plant ID_SubPlot col
1 243 D1_open open
2 370 D9_shrub shrub
3 789 D8_open open
4 143 E4_shrub shrub
5 559 U5_shrub shrub
6 588 U10_open open
7 746 S10_shrub shrub
8 618 U10_shrub shrub
9 910 S9_shrub shrub
10 898 S9_shrub shrub
test=structure(list(ID_Plant = c(243, 370, 789, 143, 559, 588, 746, 618, 910, 898),
ID_SubPlot = c("D1_open", "D9_shrub", "D8_open", "E4_shrub", "U5_shrub", "U10_open", "S10_shrub", "U10_shrub", "S9_shrub", "S9_shrub")),
row.names = c(NA, -10L), class = c("data.frame"))
Upvotes: 1