QMan5
QMan5

Reputation: 779

Remove the same substring from each value in a column in r

I'm trying to do some data cleaning to a file. The particular field I'm trying to clean describe what file it originally came from. Thus, there is always ".csv" at the end of the value in the field. I would like to remove this part of the value but keep the rest.

Here is an example of the field:

File Name
bagel.csv
donut.csv
hamburger.csv
carrots.csv

I would like the field to look something like this:

File Name
bagel
donut
hamburger
carrot

Is there a way to do this in R? Any assistance would be extremely appreciated.

Upvotes: 0

Views: 108

Answers (3)

akrun
akrun

Reputation: 887038

We can use the file_path_sans_ext from tools

tools::file_path_sans_ext(field)
#[1] "aa" "bb" "cc"

data

field <- c("aa.csv", "bb.csv", "cc.csv")

Upvotes: 0

dario
dario

Reputation: 6485

It's always better to provide a minimale reproducible example:

field <- c("aa.csv", "bb.csv", "cc.csv")

gsub("\\.csv$", "", field)

Returns:

[1] "aa" "bb" "cc"

Explanation:

We can use regex to substitute the sequence:

"." (\\.) followed by "csv" (csv) followed by end-of-line ($)

with an empty string ("")

By following the suggestion from @G5W we make sure that, since we only want to remove the extensions, we don't accidentally replace the the string if it appears in the middle of a line (As an example: In "function.csv.txt" we wouldn't want to replace the ".csv" part)

Upvotes: 5

camnesia
camnesia

Reputation: 2323

You can also use dplyr

library(dplyr)

df <- data.frame(FileName = c('bagel.csv','donut.csv','hamburger.csv','carrots.csv'))

df <- df %>% mutate(FileName = gsub("\\..*","",FileName))

Upvotes: 1

Related Questions