KikiZ
KikiZ

Reputation: 59

In R loop through rows if condition is met and string value is contained in a character vector set new column value to the character vector element

I have a character vector of file paths which always contain the names of companies in them. I also have a data frame with the a column that contains the company name.

I want to be able to check firstly that the row contains the value 'Title' in the df$style_name column. Then I want to see if the company name from the data frame is in the filepath from the character vector.

If so, assign a new column df$record to contain the corresponding filepath.

This is the character vector.

filenames <- list.files(path = dir, pattern = "*.docx|*.DOCX", full.names = TRUE)
> filenames
[1] "C:/Temp/data/D21 248694  Company Data - ABC Co - August 2021.DOCX"                            
[2] "C:/Temp/data/D21 248706  Company Data – XYZ Limited – September 2021.DOCX"

The data frame I currently have.

style_name text record
Title ABC Co NA
List Bullet blah blah NA
List Bullet blah blah NA
Title XYZ Limited NA
List Bullet blah blah NA

The data frame I am after.

style_name text record
Title ABC Co C:/Temp/data/D21 248694 Company Data - ABC Co - August 2021.DOCX
List Bullet blah blah NA
List Bullet blah blah NA
Title XYZ Limited C:/Temp/data/D21 248706 Company Data – XYZ Limited – September 2021.DOCX
List Bullet blah blah NA

This is my code currently, I think the for loop is wrong because it only populates the last row that matches the last filepath in the vector.

  for (file in filenames) {
       df$record <- ifelse((df$style_name == 'Title' & str_detect(tolower(file),tolower(df$text))), file, NA)
  }

Upvotes: 0

Views: 1442

Answers (3)

GuedesBF
GuedesBF

Reputation: 9858

We can use dplyr, tidyr, stringr, and purrr (basically the entire tidyverse).

library(tidyverse)

df %>% mutate(record=ifelse(style_name=='Title',
                            map(text, ~filenames[str_detect(filenames, .x)]),
                            NA))%>%
        unnest(cols=record, keep_empty = TRUE)

# A tibble: 5 x 3
  style_name  text      record                                                                   
  <chr>       <chr>     <chr>                                                                    
1 Title       ABC Co    C:/Temp/data/D21 248694  Company Data - ABC Co - August 2021.DOCX        
2 List Bullet blah blah NA                                                                       
3 List Bullet blah blah NA                                                                       
4 Title       XYZ       C:/Temp/data/D21 248706  Company Data – XYZ Limited – September 2021.DOCX
5 List Bullet blah blah NA 

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388862

You can try -

#Initialise record column to NA
df$record <- NA
#get the row numbers where style_name is 'Title'
inds <- which(df$style_name == 'Title')
#For each index find the corresponding filenames which matches.
for(i in inds) {
  val <- grep(df$text[i], filenames, value = TRUE)
  if(length(val)) df$record[i] <- val[1]
}
df

#   style_name        text                                                                    record
#1       Title      ABC Co         C:/Temp/data/D21 248694  Company Data - ABC Co - August 2021.DOCX
#2 List Bullet   blah blah                                                                      <NA>
#3 List Bullet   blah blah                                                                      <NA>
#4       Title XYZ Limited C:/Temp/data/D21 248706  Company Data – XYZ Limited – September 2021.DOCX
#5 List Bullet   blah blah                                                                      <NA>

Upvotes: 1

Alejo
Alejo

Reputation: 325

Try this:

# your columns
style_name = c("Title" ,"List Bullet" ,"List Bullet" ,"Title" ,"List Bullet"       )
text = c("ABC Co" ,"blah blah"  ,"blah blah"  ,"XYZ" ,"blah blah"   )

# The filenames
filenames = c("C:/Temp/data/D21 248694  Company Data - ABC Co - August 2021.DOCX"                            
             ,"C:/Temp/data/D21 248706  Company Data – XYZ Limited – September 2021.DOCX")
# create the data frame
df = data.frame(style_name,text)

# Create recods column
df$record = NA

# The for loop
for(i in 1:nrow(df)){
  df$record[i] = ifelse(sum(grepl(df$text[i],filenames)) >0 ,filenames[grepl(df$text[i],filenames)], NA)
}

grepl detects if a string is substring of another string.

if a string sss is substring of any string in an array (vector) of strings SSS then sum(grepl(sss,SSS)) > 0.

Upvotes: 0

Related Questions