Reputation: 83
I have multiple csv files namely test9 to test15. All the file has the similar structure below
data
structure(list(x = structure(c(7L, 7L, 8L, 8L, 9L, 9L, 5L, 5L,
1L, 2L, 3L, 4L, 6L), .Label = c("CASE TEST 9 1A", "CASE TEST 9 2A",
"CASE TEST 9 3A", "CASE TEST 9 4A", "CASE TEST 9 BOX ", "CASE TEST 9 Delta",
"CASE TEST 9 Inlet ", "CASE TEST 9 Outlet ", "CASE TEST 9 Total "
), class = "factor"), y = structure(c(9L, 10L, 10L, 3L, 3L, 5L,
4L, 1L, 6L, 8L, 2L, 8L, 7L), .Label = c("CASE TEST 9 1A", "CASE TEST 9 4A",
"CASE TEST 9 BOX ", "CASE TEST 9 BOX Flow", "CASE TEST 9 BOX Total ",
"CASE TEST 9 Gett", "CASE TEST 9 Inlet", "CASE TEST 9 Outlet",
"CASE TEST 9 Outlet ", "CASE TEST 9 Total "), class = "factor"),
cdf = c(0.944604888, 0.874809846, 0.691878303, 0.524043157,
0.943973621, 0.687799043, 0.5105407, 0.989077927, -0.58787388,
0.036798229, 0.780733881, 0.116990659, 0.770803109), abscorr = c(0.944604888,
0.874809846, 0.691878303, 0.524043157, 0.943973621, 0.687799043,
0.5105407, 0.989077927, 0.58787388, 0.036798229, 0.780733881,
0.116990659, 0.770803109)), .Names = c("x", "y", "cdf", "abscorr"
), class = "data.frame", row.names = c(NA, -13L))
I like to loop through each files and remove the text "CASE TEST 'z'" in column x and y. I understand I can use gsub to remove them for each dataframe, but if I were to use a loop, how I can make sure that the string to remove will change accordingly to the file name.
Output should look something like this for each dataframes:
Appreciate all your help.
Upvotes: 1
Views: 87
Reputation: 16978
How would my workflow on this look like?
.csv
files to be read in. If there are multiple files I prefer storing them in a list instead of multiple, single data.frames. A list is easier to handle and we can avoid get
-assign
-structures in an automated/loop workflow.# set work directory if necessary
# setwd("C:/testfiles")
files <- list.files(pattern = "csv$")
.csv
files. I prefer using readr
but base Rs read.csv
is okay.# readr version
# test_files <- lapply(files, function(x) readr::read_csv2(x))
test_files <- lapply(files, function(x) read.csv2(x))
names(test_files) <- gsub("\\.csv", "", files)
# base R version
result <- Map(
function(file) {
transform(file,
x = gsub("(^\\s*|CASE TEST \\d+\\s*|\\s*$)", "", x),
y = gsub("(^\\s*|CASE TEST \\d+\\s*|\\s*$)", "", y)
)
},
test_files
)
# tidyverse version
library(dplyr)
library(stringr)
library(purrr)
result <- test_files %>%
map(
~.x %>%
mutate(
across(c(x, y), ~str_trim(str_remove(.x, "CASE TEST \\d+\\s*")))
)
)
# test9
result[["test9"]]
# or
result$test9
# test10
result[["test10"]]
# or
result$test10
Upvotes: 2
Reputation: 21908
I think this could also work fine for your purpose. I noticed there are some trailing white spaces in your values in variable x
and y
so the last line of code just makes sure they are removed.
library(dplyr)
df %>%
mutate(across(c(x, y), ~
gsub("CASE\\sTEST\\s\\d+\\s(.*)", "\\1", .x, perl = TRUE))) %>%
mutate(across(c(x, y), ~ gsub("(.*)\\s$?", "\\1", .x)))
x y cdf abscorr
1 Inlet Outlet 0.94460489 0.94460489
2 Inlet Total 0.87480985 0.87480985
3 Outlet Total 0.69187830 0.69187830
4 Outlet BOX 0.52404316 0.52404316
5 Total BOX 0.94397362 0.94397362
6 Total BOX Total 0.68779904 0.68779904
7 BOX BOX Flow 0.51054070 0.51054070
8 BOX 1A 0.98907793 0.98907793
9 1A Gett -0.58787388 0.58787388
10 2A Outlet 0.03679823 0.03679823
11 3A 4A 0.78073388 0.78073388
12 4A Outlet 0.11699066 0.11699066
13 Delta Inlet 0.77080311 0.77080311
Upvotes: 3
Reputation: 351
I think this is what you're looking for :
my_test9 <- data.frame("x" = c("CASE TEST 9 a", "CASE TEST 9 b"),
"y" = c("CASE TEST 9 c", "CASE TEST 9 d"),
"cdf" = c(0.9, 0.1),
"abscorr" = c(0.9, 0.1),
stringsAsFactors = FALSE)
my_test10 <- data.frame("x" = c("CASE TEST 10 a", "CASE TEST 10 b"),
"y" = c("CASE TEST 10 c", "CASE TEST 10 d"),
"cdf" = c(0.9, 0.1),
"abscorr" = c(0.9, 0.1),
stringsAsFactors = FALSE)
my_test9$Number <- 9
my_test10$Number <- 10
my_list <- list(my_test9, my_test10)
my_function <- function(x) {
my_pattern <- paste("CASE TEST ", unique(my_list[[x]]$Number), sep = "")
my_list[[x]]$x <<- gsub(pattern = my_pattern, replacement = "", x = my_list[[x]]$x)
my_list[[x]]$y <<- gsub(pattern = my_pattern, replacement = "", x = my_list[[x]]$y)
}
my_var <- lapply(1:length(my_list), my_function)
my_test9 <- my_list[[1]]
my_test10 <- my_list[[2]]
Upvotes: 2