Remove the unmatched csv file between two folders before reading csv file

Question

I have two folders and also they have the certain pattern in file name

In "post"folder has 5files "aab.csv, bbc.csv, cfd.csv, f.csv, g.csv" In "comment"folder has 4files "aab_comment.csv, bbc_comment.csv, cfd_comment.csv, dgh_comment.csv"

They are big data file. So, before reading these files, I want to only read the matched files. Not unmatched file that the front word is not same as each other.

For example, in "post" folder, aab, bbc, cfd and in "comment" folder aab_comment, bbc_coment, cfd_comment's front word are same. So, I want to make only 3 files "aab.csv, bbc.csv, cfd.csv" in file list of post folder.

How can I make the modified_post_list (aab.csv, bbc.csv, cfd.csv)?

Below is what I tried until now.

post_dir <- c:/post/
comment_dir <- c:/comment/
post <- list.files(post_dir)
#> aab.csv',' bbc.csv', 'cfd.csv', 'f.csv', 'efg.csv', 'fgg.csv', 'gda.csv'

comment <- list.files(comment_dir)
#> 'abc_comment.csv', 'bcc_comment.csv', 'efg_comment.csv', 'fgg_comment.csv'

GKi · Accepted Answer

You can use sub to extract the front word of the file names and %in% to find the matches:

x <- sub("(.*)\..*", "\1", post)
y <- sub("(.*)_.*", "\1", comment)
post[x %in% y]
#[1] "aab.csv" "bbc.csv" "cfd.csv"
comment[y %in% x]
#[1] "aab_comment.csv" "bbc_comment.csv" "cfd_comment.csv"

Data:

post  <- c("aab.csv", "bbc.csv", "cfd.csv", "f.csv", "g.csv")
comment  <- c("aab_comment.csv", "bbc_comment.csv", "cfd_comment.csv", "dgh_comment.csv")

Remove the unmatched csv file between two folders before reading csv file

Answers (1)

Related Questions