Reputation: 8404
I have the dataframe below and I would like to create new columns based on key words in each row of the INFO
column.
ID<-c(1,2,3,4)
INFO<-c("You used works apps for 4 minutes today.","You checked your phone 10 times today.",
"Your commute time to work today was 4 minutes.","You (or at least your phone) were at your work place for 15 minutes today"
)
DATASET<-data.frame(ID,INFO)
More specifically I have to check each row of the INFO
for the words apps,phone,commute,or
. Then one new column for each one will be created and it will have either NA
or the number that exists in this row so my new dataset will be like:
DATASET2
ID INFO apps phone commute or
1 1 You used works apps for 4 minutes today. 4 NA NA NA
2 2 You checked your phone 10 times today. NA 10 NA NA
3 3 Your commute time to work today was 4 minutes. NA NA 4 NA
4 4 You (or at least your phone) were at your work place for 15 minutes today NA NA NA 15
Upvotes: 0
Views: 343
Reputation: 886938
here is an option with base R
v1 <- as.numeric(gsub("\\D+", "", DATASET$INFO))
words <- c("apps", "phone", "commute", "place")
cbind(DATASET, sapply(words, function(x) {
i1 <- grepl(x, DATASET$INFO)
NA^(!(i1 & !duplicated(i1)))}) * v1)
#ID INFO apps phone commute place
#1 1 You used works apps for 4 minutes today. 4 NA NA NA
#2 2 You checked your phone 10 times today. NA 10 NA NA
#3 3 Your commute time to work today was 4 minutes. NA NA 4 NA
#4 4 You (or at least your phone) were at your work place for 15 minutes today NA NA NA 15
Upvotes: 1
Reputation: 8110
Here is another option:
library(tidyverse)
DATASET %>%
mutate(INFO = as.character(INFO),
name = str_extract_all(INFO, "apps|phone|commute|place"),
number = parse_number(INFO)) %>%
unnest_longer(col = name) %>%
pivot_wider(names_from = "name", values_from = "number")
#> # A tibble: 4 x 6
#> ID INFO apps phone commute place
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 You used works apps for 4 minutes today. 4 NA NA NA
#> 2 2 You checked your phone 10 times today. NA 10 NA NA
#> 3 3 Your commute time to work today was 4 minutes. NA NA 4 NA
#> 4 4 You (or at least your phone) were at your wor… NA 15 NA 15
Upvotes: 2
Reputation: 39858
One solution involving purrr
, stringr
and dplyr
could be:
bind_cols(DATASET,
map_dfc(.x = c("apps", "phone", "commute", "place"),
~ DATASET %>%
mutate(!!.x := ifelse(str_detect(INFO, .x),
str_extract_all(INFO, "\\d+"),
NA_character_)) %>%
select(.x)))
ID INFO apps phone commute place
1 1 You used works apps for 4 minutes today. 4 <NA> <NA> <NA>
2 2 You checked your phone 10 times today. <NA> 10 <NA> <NA>
3 3 Your commute time to work today was 4 minutes. <NA> <NA> 4 <NA>
4 4 You (or at least your phone) were at your work place for 15 minutes today <NA> 15 <NA> 15
If there could be more than one number per row:
bind_cols(DATASET,
map_dfc(.x = c("apps", "phone", "commute", "place"),
~ DATASET %>%
mutate(!!.x := map_chr(ifelse(str_detect(INFO, .x),
str_extract_all(INFO, "\\d+"),
NA_character_),
toString)) %>%
select(.x)))
Upvotes: 3