firmo23
firmo23

Reputation: 8404

Fill new columns based on key words of an existing column

I have the dataframe below and I would like to create new columns based on key words in each row of the INFO column.

ID<-c(1,2,3,4)
INFO<-c("You used works apps for 4 minutes today.","You checked your phone 10 times today.",
        "Your commute time to work today was 4 minutes.","You (or at least your phone) were at your work place for 15 minutes today"
        )
DATASET<-data.frame(ID,INFO)

More specifically I have to check each row of the INFO for the words apps,phone,commute,or. Then one new column for each one will be created and it will have either NA or the number that exists in this row so my new dataset will be like:

DATASET2
  ID                                                                      INFO apps phone commute or
1  1                                  You used works apps for 4 minutes today.    4    NA      NA    NA
2  2                                    You checked your phone 10 times today.   NA    10      NA    NA
3  3                            Your commute time to work today was 4 minutes.   NA    NA       4    NA
4  4 You (or at least your phone) were at your work place for 15 minutes today   NA    NA      NA    15

Upvotes: 0

Views: 343

Answers (3)

akrun
akrun

Reputation: 886938

here is an option with base R

v1 <- as.numeric(gsub("\\D+",  "", DATASET$INFO))
words <-  c("apps", "phone", "commute", "place")
cbind(DATASET, sapply(words, function(x) {
     i1 <- grepl(x, DATASET$INFO)
     NA^(!(i1 & !duplicated(i1)))}) * v1)
#ID                                                                      INFO apps phone commute place
#1  1                                  You used works apps for 4 minutes today.    4    NA      NA    NA
#2  2                                    You checked your phone 10 times today.   NA    10      NA    NA
#3  3                            Your commute time to work today was 4 minutes.   NA    NA       4    NA
#4  4 You (or at least your phone) were at your work place for 15 minutes today   NA    NA      NA    15

Upvotes: 1

AndS.
AndS.

Reputation: 8110

Here is another option:

library(tidyverse)

DATASET %>%
  mutate(INFO = as.character(INFO),
         name =  str_extract_all(INFO, "apps|phone|commute|place"),
         number = parse_number(INFO)) %>%
  unnest_longer(col = name) %>%
  pivot_wider(names_from = "name", values_from = "number")
#> # A tibble: 4 x 6
#>      ID INFO                                            apps phone commute place
#>   <dbl> <chr>                                          <dbl> <dbl>   <dbl> <dbl>
#> 1     1 You used works apps for 4 minutes today.           4    NA      NA    NA
#> 2     2 You checked your phone 10 times today.            NA    10      NA    NA
#> 3     3 Your commute time to work today was 4 minutes.    NA    NA       4    NA
#> 4     4 You (or at least your phone) were at your wor…    NA    15      NA    15

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

One solution involving purrr, stringr and dplyr could be:

bind_cols(DATASET, 
          map_dfc(.x = c("apps", "phone", "commute", "place"),
                  ~ DATASET %>%
                   mutate(!!.x := ifelse(str_detect(INFO, .x), 
                                         str_extract_all(INFO, "\\d+"), 
                                         NA_character_)) %>%
                   select(.x)))

  ID                                                                      INFO apps phone commute place
1  1                                  You used works apps for 4 minutes today.    4  <NA>    <NA>  <NA>
2  2                                    You checked your phone 10 times today. <NA>    10    <NA>  <NA>
3  3                            Your commute time to work today was 4 minutes. <NA>  <NA>       4  <NA>
4  4 You (or at least your phone) were at your work place for 15 minutes today <NA>    15    <NA>    15

If there could be more than one number per row:

bind_cols(DATASET, 
          map_dfc(.x = c("apps", "phone", "commute", "place"),
                  ~ DATASET %>%
                   mutate(!!.x := map_chr(ifelse(str_detect(INFO, .x), 
                                                 str_extract_all(INFO, "\\d+"), 
                                                 NA_character_), 
                                          toString)) %>%
                   select(.x)))

Upvotes: 3

Related Questions