Reputation: 197
The aim is to (1) extract variables (columns) with specific names (2) loop for all files in the folder and (3) save the output in csv files.
For example, the data look like for one file:
structure(list(frame = c(1, 2, 3, 4, 5), g_1 = c(0.1, 0.2, 0.1,
0, 0), g_2 = c(0, 0, 0, 0, 0), xy_1 = c(0.4, 0.43, 0.32, 0.33,
0.39), xy_2 = c(0, 0, 0, 0, 0.1), unit_1 = c(1, 1, 1, 1, 1),
unit_2 = c(0, 0, 0, 0, 0), unit_3 = c(0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA,
-5L), variable.labels = structure(character(0), .Names = character(0)), codepage = 65001L)
For each dataset, I only need the variables (1) frame and (2) those starting with "unit_". So the output I need to get and save in csv for each file looks like:
╔═══════╦════════╦════════╦════════╗
║ frame ║ unit_1 ║ unit_2 ║ unit_3 ║
╠═══════╬════════╬════════╬════════╣
║ 1.00 ║ 1.00 ║ .00 ║ .00 ║
╠═══════╬════════╬════════╬════════╣
║ 2.00 ║ 1.00 ║ .00 ║ .00 ║
╠═══════╬════════╬════════╬════════╣
║ 3.00 ║ 1.00 ║ .00 ║ .00 ║
╠═══════╬════════╬════════╬════════╣
║ 4.00 ║ 1.00 ║ .00 ║ .00 ║
╠═══════╬════════╬════════╬════════╣
║ 5.00 ║ 1.00 ║ .00 ║ .00 ║
╚═══════╩════════╩════════╩════════╝
I'm wondering if there is a way to do this using dplyr or others.
I was thinking of:
files <- list.files(path="C:/files", pattern="csv")
out <- lapply(files, function(file) {
dat <- data.frame(fread(paste0("C:/files/", file) ) )
dat <- dat[c("frame", "unit_1", "unit_2", "unit_3")]})
From here, I was not clear how to save each file into csv files.
Upvotes: 1
Views: 243
Reputation: 886948
We use select
with select_helpers
like starts_with
library(dplyr)
df1 %>%
select(frame, starts_with('unit'))
-output
# frame unit_1 unit_2 unit_3
#1 1 1 0 0
#2 2 1 0 0
#3 3 1 0 0
#4 4 1 0 0
#5 5 1 0 0
Or a regex in matches
to match the pattern that starts (^
) with the substring 'unit' followed by a _
and one or more digits (\\d+
) at the end ($
) of the string as column names
df1 %>%
select(frame, matches('^unit_\\d+$'))
If it is a list
of data.frame, loop over the list
with map/walk
, read the files with read_csv
, select the columns of interest and create new files in the same directory with prefix 'new_' using write_csv
library(purrr)
library(readr)
library(stringr)
walk(files, ~ read_csv(file.path("C:/files", .x)) %>%
select(frame, starts_with('unit')) %>%
write_csv(file.path("C:/files", str_c("new_", .x)))
)
The equivalent option in base R
is startsWith
subset(df1, select = c('frame',
names(df1)[startsWith(names(df1), 'unit')]))
Or with grep
subset(df1, select = grep('^(frame|unit)', names(df1), value = TRUE))
and for the list
of files
lapply(files, function(x) {
tmp <- subset(read.csv(file.path("C:/files", x)),
select = grep('^(frame|unit)', names(df1), value = TRUE))
write.csv(tmp, file.path("C:/files", paste0("new_", x)),
row.names = FALSE, quote = FALSE)
})
Upvotes: 1