user12388610
user12388610

Reputation: 207

Remove the unmatched csv file between two folders before reading csv file

I have two folders and also they have the certain pattern in file name

In "post"folder has 5files "aab.csv, bbc.csv, cfd.csv, f.csv, g.csv" In "comment"folder has 4files "aab_comment.csv, bbc_comment.csv, cfd_comment.csv, dgh_comment.csv"

They are big data file. So, before reading these files, I want to only read the matched files. Not unmatched file that the front word is not same as each other.

For example, in "post" folder, aab, bbc, cfd and in "comment" folder aab_comment, bbc_coment, cfd_comment's front word are same. So, I want to make only 3 files "aab.csv, bbc.csv, cfd.csv" in file list of post folder.

How can I make the modified_post_list (aab.csv, bbc.csv, cfd.csv)?

Below is what I tried until now.

post_dir <- c:/post/
comment_dir <- c:/comment/
post <- list.files(post_dir)
#> aab.csv',' bbc.csv', 'cfd.csv', 'f.csv', 'efg.csv', 'fgg.csv', 'gda.csv'

comment <- list.files(comment_dir)
#> 'abc_comment.csv', 'bcc_comment.csv', 'efg_comment.csv', 'fgg_comment.csv'

Upvotes: 1

Views: 63

Answers (1)

GKi
GKi

Reputation: 39657

You can use sub to extract the front word of the file names and %in% to find the matches:

x <- sub("(.*)\\..*", "\\1", post)
y <- sub("(.*)_.*", "\\1", comment)
post[x %in% y]
#[1] "aab.csv" "bbc.csv" "cfd.csv"
comment[y %in% x]
#[1] "aab_comment.csv" "bbc_comment.csv" "cfd_comment.csv"

Data:

post  <- c("aab.csv", "bbc.csv", "cfd.csv", "f.csv", "g.csv")
comment  <- c("aab_comment.csv", "bbc_comment.csv", "cfd_comment.csv", "dgh_comment.csv")

Upvotes: 2

Related Questions