Reputation: 305
I have a dataframe with strings in a column. How could I extract only the substrings that are in capital letters and add them to another column?
This is an example:
fecha incident
1 2020-12-01 Check GENERATOR
2 2020-12-01 Check BLADE
3 2020-12-02 Problem in GENERATOR
4 2020-12-01 Check YAW
5 2020-12-02 Alarm in SAFETY SYSTEM
And I would like to create another column as follows:
fecha incident system
1 2020-12-01 Check GENERATOR GENERATOR
2 2020-12-01 Check BLADE BLADE
3 2020-12-02 Problem in GENERATOR GENERATOR
4 2020-12-01 Check YAW YAW
5 2020-12-02 Alarm in SAFETY SYSTEM SAFETY SYSTEM
I have tried with str_sub
or str_extract_all
using a regex but I believe I'm doing thigs wrong.
Upvotes: 4
Views: 5116
Reputation: 389325
If there are cases when the upper-case letters are not next to each other you can use str_extract_all
to extract all the capital letters in a sentence and then paste them together.
sapply(stringr::str_extract_all(df$incident, '[A-Z]{2,}'),paste0, collapse = ' ')
#[1] "GENERATOR" "BLADE" "GENERATOR" "YAW" "SAFETY SYSTEM"
Upvotes: 0
Reputation: 1175
You can use str_extract
if you want to work in a dataframe and tie it into a tidyverse workflow.
The regex asks either for capital letters or space and there need to be two or more consecutive ones (so it does not find capitalized words). str_trim
removes the white-space that can get picked up if the capitalized word is not at the end of the string.
Note that this code snipped will only extract the first capitalized words connected via a space. If there are capitalized words in different parts of the string, only the first one will be returned.
library(tidyverse)
x <- c("CAPITAL and not Capital", "one more CAP word", "MULTIPLE CAPITAL words", "CAP words NOT connected")
cap <- str_trim(str_extract(x, "([:upper:]|[:space:]){2,}"))
cap
#> [1] "CAPITAL" "CAP" "MULTIPLE CAPITAL" "CAP"
Created on 2021-01-08 by the reprex package (v0.3.0)
Upvotes: 7
Reputation: 486
library(tidyverse)
string <- data.frame(test="does this WORK")
string$new <-str_extract_all(string$test, "[A-Z]+")
string
test new
1 does this WORK WORK
Upvotes: 0