ranaz
ranaz

Reputation: 97

Extracting part of string in list to populate a table

I have been trying to write a code that read from multiple excel file in folder match a name. I have been able to achieve that part using the code at the show some code. The columns in the dataframe is id and Date.

My issue is I want to add another column called Code that will hold a code extracted from the files list to differentiate each row.

Initial dataframe after reaading the file and combinning that datasets

id                          Date              
    ExcelFile/CP1213_.xlsx  2013-05-09        
    ExcelFile/CP1213_.xlsx  2013-01-30      
    ExcelFile/CP1314_.xlsx  2013-02-14        
    ExcelFile/CP1314_.xlsx  2013-03-19        
    ExcelFile/CP1415_.xlsx  2013-02-22      
    ExcelFile/CP1415_.xlsx  2013-02-22      

The table below shows what i want to achieve:

id                          Date            Code   
ExcelFile/CP1213_.xlsx      2013-05-09      CP1213  
ExcelFile/CP1213_.xlsx      2013-01-30      CP1213
ExcelFile/CP1314_.xlsx      2013-02-14      CP1314  
ExcelFile/CP1314_.xlsx      2013-03-19      CP1314  
ExcelFile/CP1415_.xlsx      2013-02-22      CP1415 
ExcelFile/CP1415_.xlsx      2013-02-22      CP1415

The output of the files is a list: "ExcelFile/CP1213_.xlsx" "ExcelFile/CP1314_.xlsx" "ExcelFile/CP1415_.xlsx"

files <- list.files(path = "ExcelFile/", pattern   = "*.xlsx", full.names = T)



tbl <- sapply(files, read_excel, simplify=FALSE) %>% bind_rows(.id = "id") 

Upvotes: 1

Views: 120

Answers (2)

ranaz
ranaz

Reputation: 97

Base on  Ronak Shah idea, you can use the mutate from the dplyr package then use the basename and then extract part of the filename using sub from the id.

files <- list.files(path = "ExcelFile/", pattern   = "*.xlsx", full.names = T)

tbl <- sapply(files, read_excel, simplify=FALSE) %>% bind_rows(.id = "id")

tbl <- tbl %>% mutate(Code = sub("_.*", "", basename(tbl$id)))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389047

You can use basename and then extract part of the filename using sub

df$Code <- sub("_.*", "", basename(as.character(df$id)))

df
#                      id       Date   Code
#1 ExcelFile/CP1213_.xlsx 2013-05-09 CP1213
#2 ExcelFile/CP1213_.xlsx 2013-01-30 CP1213
#3 ExcelFile/CP1314_.xlsx 2013-02-14 CP1314
#4 ExcelFile/CP1314_.xlsx 2013-03-19 CP1314
#5 ExcelFile/CP1415_.xlsx 2013-02-22 CP1415
#6 ExcelFile/CP1415_.xlsx 2013-02-22 CP1415

Or if you want to extract it from files

df$Code <- sub("_.*", "", basename(files))

Upvotes: 0

Related Questions