Reputation: 207
I have two folders and also they have the certain pattern in file name
In "post"folder has 5files "aab.csv, bbc.csv, cfd.csv, f.csv, g.csv" In "comment"folder has 4files "aab_comment.csv, bbc_comment.csv, cfd_comment.csv, dgh_comment.csv"
They are big data file. So, before reading these files, I want to only read the matched files. Not unmatched file that the front word is not same as each other.
For example, in "post" folder, aab, bbc, cfd and in "comment" folder aab_comment, bbc_coment, cfd_comment's front word are same. So, I want to make only 3 files "aab.csv, bbc.csv, cfd.csv" in file list of post folder.
How can I make the modified_post_list (aab.csv, bbc.csv, cfd.csv)?
Below is what I tried until now.
post_dir <- c:/post/
comment_dir <- c:/comment/
post <- list.files(post_dir)
#> aab.csv',' bbc.csv', 'cfd.csv', 'f.csv', 'efg.csv', 'fgg.csv', 'gda.csv'
comment <- list.files(comment_dir)
#> 'abc_comment.csv', 'bcc_comment.csv', 'efg_comment.csv', 'fgg_comment.csv'
Upvotes: 1
Views: 63
Reputation: 39657
You can use sub
to extract the front word of the file names and %in%
to find the matches:
x <- sub("(.*)\\..*", "\\1", post)
y <- sub("(.*)_.*", "\\1", comment)
post[x %in% y]
#[1] "aab.csv" "bbc.csv" "cfd.csv"
comment[y %in% x]
#[1] "aab_comment.csv" "bbc_comment.csv" "cfd_comment.csv"
Data:
post <- c("aab.csv", "bbc.csv", "cfd.csv", "f.csv", "g.csv")
comment <- c("aab_comment.csv", "bbc_comment.csv", "cfd_comment.csv", "dgh_comment.csv")
Upvotes: 2