Reputation: 46799
I want to get the filename (without extension) and the extension separately.
The best solution I found so far is:
NAME=`echo "$FILE" | cut -d'.' -f1`
EXTENSION=`echo "$FILE" | cut -d'.' -f2`
This is wrong because it doesn't work if the file name contains multiple .
characters. If, let's say, I have a.b.js
, it will consider a
and b.js
, instead of a.b
and js
.
It can be easily done in Python with
file, ext = os.path.splitext(path)
but I'd prefer not to fire up a Python interpreter just for this, if possible.
Any better ideas?
Upvotes: 2895
Views: 2372863
Reputation: 1307
Here are some alternative suggestions (mostly in awk
), including some advanced use cases, like extracting version numbers for software packages.
Just note that with a slightly different input some of these may fail, therefore anyone using these should validate on their expected input and adapt the regex expression if required.
f='/path/to/complex/file.1.0.1.tar.gz'
# Filename : 'file.1.0.x.tar.gz'
echo "$f" | awk -F'/' '{print $NF}'
# Extension (last): 'gz'
echo "$f" | awk -F'[.]' 'NF>1 {print $NF}'
# Extension (all) : '1.0.1.tar.gz'
echo "$f" | awk '{sub(/[^.]*[.]/, "", $0)} 1'
# Extension (last-2): 'tar.gz'
echo "$f" | awk -F'[.]' '{print $(NF-1)"."$NF}'
# Basename : 'file'
echo "$f" | awk '{gsub(/.*[/]|[.].*/, "", $0)} 1'
# Basename-extended : 'file.1.0.1.tar'
echo "$f" | awk '{gsub(/.*[/]|[.]{1}[^.]+$/, "", $0)} 1'
# Path : '/path/to/complex/'
echo "$f" | awk '{match($0, /.*[/]/, a); print a[0]}'
# or
echo "$f" | grep -Eo '.*[/]'
# Folder (containing the file) : 'complex'
echo "$f" | awk -F'/' '{$1=""; print $(NF-1)}'
# Version : '1.0.1'
# Defined as 'number.number' or 'number.number.number'
echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?'
# Version - major : '1'
echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f1
# Version - minor : '0'
echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f2
# Version - patch : '1'
echo "$f" | grep -Eo '[0-9]+[.]+[0-9]+[.]?[0-9]?' | cut -d. -f3
# All Components : "path to complex file 1 0 1 tar gz"
echo "$f" | awk -F'[/.]' '{$1=""; print $0}'
# Is absolute : True (exit-code : 0)
# Return true if it is an absolute path (starting with '/' or '~/'
echo "$f" | grep -q '^[/]\|^~/'
All use cases are using the original full path as input, without depending on intermediate results.
Upvotes: 40
Reputation: 53
Funfact: if you convert .
to /
, basename thinks the extension is the filename.
basename "`tr . / <<< "$filename"`"
Kind of silly, but short. Variable expansion is better though.
Upvotes: 0
Reputation: 4347
echo {} | grep -Eo "\w+$" # alphanumeric characters
or
echo {} | grep -Eo "[a-z]+$" # just lower case characters
or
echo {} | grep -Eo "[^.]+$" # any character outside the "."
or
echo {} | grep -Eo "\w+\.\w+$" # last two segments e.g. test.ts
# and so on
Imo, this approach is worth a mention because using grep extended regex to match against the last segment(s) a.k.a extension is simple but effective. Note the $
symbol denoting the end of the pattern/path.
Upvotes: 0
Reputation: 475
$ F="text file.test.txt"
$ echo ${F/*./}
txt
This caters for multiple dots and spaces in a filename, however if there is no extension it returns the filename itself. Easy to check for though; just test for the filename and extension being the same.
Naturally this method doesn't work for .tar.gz files. However that could be handled in a two step process. If the extension is gz then check again to see if there is also a tar extension.
Upvotes: 14
Reputation: 17218
No previous answer used a bash regex
Here's a pure bash solution that splits a path into:
/
when present/
is so much longer that I didn't post it.
The code is meant to handle every possible case, you're welcome to try it.
#!/bin/bash
for path; do
####### the relevant part ######
[[ $path =~ ^(\.{1,2}|.*/\.{0,2})$|^(.*/)([^/]+)(\.[^/]*)$|^(.*/)(.+)$|^(.+)(\..*)$|^(.+)$ ]]
dirpath=${BASH_REMATCH[1]}${BASH_REMATCH[2]}${BASH_REMATCH[5]}
filename=${BASH_REMATCH[3]}${BASH_REMATCH[6]}${BASH_REMATCH[7]}${BASH_REMATCH[9]}
filext=${BASH_REMATCH[4]}${BASH_REMATCH[8]}
# dirpath should be non-null
[[ $dirpath ]] || dirpath='.'
################################
printf '%s=%q\n' \
path "$path" \
dirpath "$dirpath" \
filename "$filename" \
filext "$filext"
done
How does it work?
Basically, it ensures that only one sub-expression (delimited with |
in the regex) is able to capture the input. Thanks to that, you can concatenate all the capture groups of the same type (for example, the ones related to the directory path) stored in BASH_REMATCH
, because at most one will be non-null.
+--------------------------------------------------------+
| input dirpath filename filext |
+--------------------------------------------------------+
'' . '' ''
. . '' ''
.. .. '' ''
... . .. .
.file . .file ''
.file. . .file .
.file.. . .file. .
.file.Z . .file .Z
.file.sh.Z . .file.sh .Z
file . file ''
file. . file .
file.. . file. .
file.Z . file .Z
file.sh.Z . file.sh .Z
dir/ dir/ '' ''
dir/. dir/. '' ''
dir/... dir/ .. .
dir/.file dir/ .file ''
dir/.file. dir/ .file .
dir/.file.. dir/ .file. .
dir/.file.Z dir/ .file .Z
dir/.file.x.Z dir/ .file.x .Z
dir/file dir/ file ''
dir/file. dir/ file .
dir/file.. dir/ file. .
dir/file.Z dir/ file .Z
dir/file.x.Z dir/ file.x .Z
dir./. dir./. '' ''
dir./... dir./ .. .
dir./.file dir./ .file ''
dir./.file. dir./ .file .
dir./.file.. dir./ .file. .
dir./.file.Z dir./ .file .Z
dir./.file.sh.Z dir./ .file.sh .Z
dir./file dir./ file ''
dir./file. dir./ file .
dir./file.. dir./ file. .
dir./file.Z dir./ file .Z
dir./file.x.Z dir./ file.x .Z
dir// dir// '' ''
dir//. dir//. '' ''
dir//... dir// .. .
dir//.file dir// .file ''
dir//.file. dir// .file .
dir//.file.. dir// .file. .
dir//.file.Z dir// .file .Z
dir//.file.x.Z dir// .file.x .Z
dir//file dir// file ''
dir//file. dir// file .
dir//file.. dir// file. .
dir//file.Z dir// file .Z
dir//file.x.Z dir// file.x .Z
dir.//. dir.//. '' ''
dir.//... dir.// .. .
dir.//.file dir.// .file ''
dir.//.file. dir.// .file .
dir.//.file.. dir.// .file. .
dir.//.file.Z dir.// .file .Z
dir.//.file.x.Z dir.// .file.x .Z
dir.//file dir.// file ''
dir.//file. dir.// file .
dir.//file.. dir.// file. .
dir.//file.Z dir.// file .Z
dir.//file.x.Z dir.// file.x .Z
/ / '' ''
/. /. '' ''
/.. /.. '' ''
/... / .. .
/.file / .file ''
/.file. / .file .
/.file.. / .file. .
/.file.Z / .file .Z
/.file.sh.Z / .file.sh .Z
/file / file ''
/file. / file .
/file.. / file. .
/file.Z / file .Z
/file.sh.Z / file.sh .Z
/dir/ /dir/ '' ''
/dir/. /dir/. '' ''
/dir/... /dir/ .. .
/dir/.file /dir/ .file ''
/dir/.file. /dir/ .file .
/dir/.file.. /dir/ .file. .
/dir/.file.Z /dir/ .file .Z
/dir/.file.x.Z /dir/ .file.x .Z
/dir/file /dir/ file ''
/dir/file. /dir/ file .
/dir/file.. /dir/ file. .
/dir/file.Z /dir/ file .Z
/dir/file.x.Z /dir/ file.x .Z
/dir./. /dir./. '' ''
/dir./... /dir./ .. .
/dir./.file /dir./ .file ''
/dir./.file. /dir./ .file .
/dir./.file.. /dir./ .file. .
/dir./.file.Z /dir./ .file .Z
/dir./.file.sh.Z /dir./ .file.sh .Z
/dir./file /dir./ file ''
/dir./file. /dir./ file .
/dir./file.. /dir./ file. .
/dir./file.Z /dir./ file .Z
/dir./file.x.Z /dir./ file.x .Z
/dir// /dir// '' ''
/dir//. /dir//. '' ''
/dir//... /dir// .. .
/dir//.file /dir// .file ''
/dir//.file. /dir// .file .
/dir//.file.. /dir// .file. .
/dir//.file.Z /dir// .file .Z
/dir//.file.x.Z /dir// .file.x .Z
/dir//file /dir// file ''
/dir//file. /dir// file .
/dir//file.. /dir// file. .
/dir//file.Z /dir// file .Z
/dir//file.x.Z /dir// file.x .Z
/dir.//. /dir.//. '' ''
/dir.//... /dir.// .. .
/dir.//.file /dir.// .file ''
/dir.//.file. /dir.// .file .
/dir.//.file.. /dir.// .file. .
/dir.//.file.Z /dir.// .file .Z
/dir.//.file.x.Z /dir.// .file.x .Z
/dir.//file /dir.// file ''
/dir.//file. /dir.// file .
/dir.//file.. /dir.// file. .
/dir.//file.Z /dir.// file .Z
/dir.//file.x.Z /dir.// file.x .Z
// // '' ''
//. //. '' ''
//.. //.. '' ''
//... // .. .
//.file // .file ''
//.file. // .file .
//.file.. // .file. .
//.file.Z // .file .Z
//.file.sh.Z // .file.sh .Z
//file // file ''
//file. // file .
//file.. // file. .
//file.Z // file .Z
//file.sh.Z // file.sh .Z
//dir/ //dir/ '' ''
//dir/. //dir/. '' ''
//dir/... //dir/ .. .
//dir/.file //dir/ .file ''
//dir/.file. //dir/ .file .
//dir/.file.. //dir/ .file. .
//dir/.file.Z //dir/ .file .Z
//dir/.file.x.Z //dir/ .file.x .Z
//dir/file //dir/ file ''
//dir/file. //dir/ file .
//dir/file.. //dir/ file. .
//dir/file.Z //dir/ file .Z
//dir/file.x.Z //dir/ file.x .Z
//dir./. //dir./. '' ''
//dir./... //dir./ .. .
//dir./.file //dir./ .file ''
//dir./.file. //dir./ .file .
//dir./.file.. //dir./ .file. .
//dir./.file.Z //dir./ .file .Z
//dir./.file.sh.Z //dir./ .file.sh .Z
//dir./file //dir./ file ''
//dir./file. //dir./ file .
//dir./file.. //dir./ file. .
//dir./file.Z //dir./ file .Z
//dir./file.x.Z //dir./ file.x .Z
//dir// //dir// '' ''
//dir//. //dir//. '' ''
//dir//... //dir// .. .
//dir//.file //dir// .file ''
//dir//.file. //dir// .file .
//dir//.file.. //dir// .file. .
//dir//.file.Z //dir// .file .Z
//dir//.file.x.Z //dir// .file.x .Z
//dir//file //dir// file ''
//dir//file. //dir// file .
//dir//file.. //dir// file. .
//dir//file.Z //dir// file .Z
//dir//file.x.Z //dir// file.x .Z
//dir.//. //dir.//. '' ''
//dir.//... //dir.// .. .
//dir.//.file //dir.// .file ''
//dir.//.file. //dir.// .file .
//dir.//.file.. //dir.// .file. .
//dir.//.file.Z //dir.// .file .Z
//dir.//.file.x.Z //dir.// .file.x .Z
//dir.//file //dir.// file ''
//dir.//file. //dir.// file .
//dir.//file.. //dir.// file. .
//dir.//file.Z //dir.// file .Z
//dir.//file.x.Z //dir.// file.x .Z
As you can see, the behaviour is different from basename
and dirname
.
For example basename dir/
outputs dir
while the regex will give you an empty filename for it. Same for .
and ..
, those are considered directories, not filenames.
I timed it with 10000 paths of 256 characters and it took about 1 second, while the equivalent POSIX shell solution is 2x slower and solutions based on wild forking (external calls inside the for
loop) are at least 60x slower.
remark: It's not necessary to test paths that contain \n
or other notorious characters because all characters are handled the same way by the regex engine of bash. The only characters that would be able to break the current logic are /
and .
, intermixed or multiplied in a currently unexpected way. When I first posted my answer I found a few border cases that I had to fix; I can't say that the regex is 100% bullet proof but it should be quite robust now.
As an aside, here's the POSIX shell solution that yields the same output:
#!/bin/sh
for path; do
####### the relevant part ######
fullname=${path##*/}
case $fullname in
. | ..)
dirpath="$path"
filename=''
filext=''
;;
*)
dirpath=${path%"$fullname"}
dirpath=${dirpath:-.} # dirpath should be non-null
filename=${fullname#.}
filename="${fullname%"$filename"}${filename%.*}"
filext=${fullname#"$filename"}
;;
esac
################################
printf '%s=%s\n' \
path "$path" \
dirpath "$dirpath" \
filename "$filename" \
filext "$filext"
done
postscript: There are a few points for which some people might disagree with the results given by the above codes:
The special case of dotfiles: The reason is that dotfiles are a UNIX concept.
The special case of .
and ..
: IMHO it seems obvious to treat them as directories, but most libraries don't do that and force the user to post-process the result.
No support for double-extensions: That's because you'd need a whole database for storing all the valid double-extensions, and above all, because file extensions don't mean anything in UNIX; for example you can call a tar archive my_tarred_files
and that's completely fine, you'll be able to tar xf my_tarred_files
without any problem.
Upvotes: 8
Reputation: 882296
pax> echo a.b.js | sed 's/\.[^.]*$//'
a.b
pax> echo a.b.js | sed 's/^.*\.//'
js
works fine, so you can just use:
pax> FILE=a.b.js
pax> NAME=$(echo "$FILE" | sed 's/\.[^.]*$//')
pax> EXTENSION=$(echo "$FILE" | sed 's/^.*\.//')
pax> echo $NAME
a.b
pax> echo $EXTENSION
js
The commands, by the way, work as follows.
The command for NAME
substitutes a "."
character followed by any number of non-"."
characters up to the end of the line, with nothing (i.e., it removes everything from the final "."
to the end of the line, inclusive). This is basically a non-greedy substitution using regex trickery.
The command for EXTENSION
substitutes a any number of characters followed by a "."
character at the start of the line, with nothing (i.e., it removes everything from the start of the line to the final dot, inclusive). This is a greedy substitution which is the default action.
Upvotes: 80
Reputation: 4903
You can use the magic of POSIX parameter expansion:
bash-3.2$ FILENAME=somefile.tar.gz
bash-3.2$ echo "${FILENAME%%.*}"
somefile
bash-3.2$ echo "${FILENAME%.*}"
somefile.tar
There's a caveat in that if your filename was of the form ./somefile.tar.gz
then echo ${FILENAME%%.*}
would greedily remove the longest match to the .
and you'd have the empty string.
(You can work around that with a temporary variable:
FULL_FILENAME=$FILENAME
FILENAME=${FULL_FILENAME##*/}
echo ${FILENAME%%.*}
)
This site explains more.
${variable%pattern}
Trim the shortest match from the end
${variable##pattern}
Trim the longest match from the beginning
${variable%%pattern}
Trim the longest match from the end
${variable#pattern}
Trim the shortest match from the beginning
Upvotes: 221
Reputation: 41437
~% FILE="example.tar.gz"
~% echo "${FILE%%.*}"
example
~% echo "${FILE%.*}"
example.tar
~% echo "${FILE#*.}"
tar.gz
~% echo "${FILE##*.}"
gz
For more details, see shell parameter expansion in the Bash manual.
Upvotes: 1195
Reputation: 4173
This is the only one that worked for me:
path='folder/other_folder/file.js'
base=${path##*/}
echo ${base%.*}
>> file
This can also be used in string interpolation as well, but unfortunately you have to set base
beforehand.
Upvotes: 13
Reputation: 9
You can also use a for
loop and tr
to extract the filename from the path...
for x in `echo $path | tr "/" " "`; do filename=$x; done
The tr
replaces all "/" delimiters in path with spaces so making a list of strings, and the for
loop scans through them leaving the last one in the filename
variable.
Upvotes: -1
Reputation: 720
Here is a sed
solution that extracts path components in a variety of forms and can handle most edge cases:
## Enter the input path and field separator character, for example:
## (separatorChar must not be present in inputPath)
inputPath="/path/to/Foo.bar"
separatorChar=":"
## sed extracts the path components and assigns them to output variables
oldIFS="$IFS"
IFS="$separatorChar"
read dirPathWithSlash dirPath fileNameWithExt fileName fileExtWithDot fileExt <<<"$(sed -En '
s/^[[:space:]]+//
s/[[:space:]]+$//
t l1
:l1
s/^([^/]|$)//
t
s/[/]+$//
t l2
:l2
s/^$/filesystem\/\
filesystem/p
t
h
s/^(.*)([/])([^/]+)$/\1\2\
\1\
\3/p
g
t l3
:l3
s/^.*[/]([^/]+)([.])([a-zA-Z0-9]+)$/\1\
\2\3\
\3/p
t
s/^.*[/](.+)$/\1/p
' <<<"$inputPath" | tr "\n" "$separatorChar")"
IFS="$oldIFS"
## Results (all use separatorChar=":")
## inputPath = /path/to/Foo.bar
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = Foo.bar
## fileName = Foo
## fileExtWithDot = .bar
## fileExt = bar
## inputPath = /path/to/Foobar
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = Foobar
## fileName = Foobar
## fileExtWithDot =
## fileExt =
## inputPath = /path/to/...bar
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = ...bar
## fileName = ..
## fileExtWithDot = .bar
## fileExt = bar
## inputPath = /path/to/..bar
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = ..bar
## fileName = .
## fileExtWithDot = .bar
## fileExt = bar
## inputPath = /path/to/.bar
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = .bar
## fileName = .bar
## fileExtWithDot =
## fileExt =
## inputPath = /path/to/...
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = ...
## fileName = ...
## fileExtWithDot =
## fileExt =
## inputPath = /path/to/Foo.
## dirPathWithSlash = /path/to/
## dirPath = /path/to
## fileNameWithExt = Foo.
## fileName = Foo.
## fileExtWithDot =
## fileExt =
## inputPath = / (the root directory)
## dirPathWithSlash = filesystem/
## dirPath = filesystem
## fileNameWithExt =
## fileName =
## fileExtWithDot =
## fileExt =
## inputPath = (invalid because empty)
## dirPathWithSlash =
## dirPath =
## fileNameWithExt =
## fileName =
## fileExtWithDot =
## fileExt =
## inputPath = Foo/bar (invalid because doesn't start with a forward slash)
## dirPathWithSlash =
## dirPath =
## fileNameWithExt =
## fileName =
## fileExtWithDot =
## fileExt =
Here's how it works:
sed
parses the input path and prints the following path components in order on separate lines:
tr
converts the sed
output into a separator character-delimited string of the above path components.
read
uses the separator character as the field separator (IFS="$separatorChar"
) and assigns each of the path components to its respective variable.
Here's how the sed
construct works:
s/^[[:space:]]+//
and s/[[:space:]]+$//
strip any leading and/or trailing whitespace characterst l1
and :l1
refreshes the t
function for the next s
functions/^([^/]|$)//
and t
tests for an invalid input path (one that does not begin with a forward slash), in which case it leaves all output lines blank and quits the sed
commands/[/]+$//
strips any trailing slashest l2
and :l2
refreshes the t
function for the next s
functions/^$/filesystem\/\\[newline]filesystem/p
and t
tests for the special case where the input path consists of the root directory /, in which case it prints filesystem/ and filesystem for the dirPathWithSlash and dirPath output lines, leaves all other output lines blank, and quits the sed commandh
saves the input path in the hold spaces/^(.*)([/])([^/]+)$/\1\2\\[newline]\1\\[newline]\3/p
prints the dirPathWithSlash, dirPath, and fileNameWithExt output linesg
retrieves the input path from the hold spacet l3
and :l3
refreshes the t
function for the next s
functions/^.*\[/]([^/]+)([.])([a-zA-Z0-9]+)$/\1\\[newline]\2\3\\[newline]\3/p
and t
prints the fileName, fileExtWithDot, and fileExt output lines for the case where a file extension exists, (assumed to consist of alphanumeric characters only), then quits the sed
commands/^.*\[/](.+)$/\1/p
prints the fileName but not the fileExtWithDot, and fileExt output lines for the case where a file extension does not exist, then quits the sed
command.Upvotes: 0
Reputation: 16473
Based largely off of @mklement0's excellent, and chock-full of random, useful bashisms - as well as other answers to this / other questions / "that darn internet"... I wrapped it all up in a little, slightly more comprehensible, reusable function for my (or your) .bash_profile
that takes care of what (I consider) should be a more robust version of dirname
/basename
/ what have you..
function path { SAVEIFS=$IFS; IFS="" # stash IFS for safe-keeping, etc.
[[ $# != 2 ]] && echo "usage: path <path> <dir|name|fullname|ext>" && return # demand 2 arguments
[[ $1 =~ ^(.*/)?(.+)?$ ]] && { # regex parse the path
dir=${BASH_REMATCH[1]}
file=${BASH_REMATCH[2]}
ext=$([[ $file = *.* ]] && printf %s ${file##*.} || printf '')
# edge cases for extensionless files and files like ".nesh_profile.coffee"
[[ $file == $ext ]] && fnr=$file && ext='' || fnr=${file:0:$((${#file}-${#ext}))}
case "$2" in
dir) echo "${dir%/*}"; ;;
name) echo "${fnr%.*}"; ;;
fullname) echo "${fnr%.*}.$ext"; ;;
ext) echo "$ext"; ;;
esac
}
IFS=$SAVEIFS
}
Usage examples...
SOMEPATH=/path/to.some/.random\ file.gzip
path $SOMEPATH dir # /path/to.some
path $SOMEPATH name # .random file
path $SOMEPATH ext # gzip
path $SOMEPATH fullname # .random file.gzip
path gobbledygook # usage: -bash <path> <dir|name|fullname|ext>
Upvotes: 5
Reputation: 439767
The accepted answer works well in typical cases, but fails in edge cases, namely:
extension=${filename##*.}
returns the input filename rather than an empty string.extension=${filename##*.}
does not include the initial .
, contrary to convention.
.
would not work for filenames without suffix.filename="${filename%.*}"
will be the empty string, if the input file name starts with .
and contains no further .
characters (e.g., .bash_profile
) - contrary to convention. Thus, the complexity of a robust solution that covers all edge cases calls for a function - see its definition below; it can return all components of a path.
Example call:
splitPath '/etc/bash.bashrc' dir fname fnameroot suffix
# -> $dir == '/etc'
# -> $fname == 'bash.bashrc'
# -> $fnameroot == 'bash'
# -> $suffix == '.bashrc'
Note that the arguments after the input path are freely chosen, positional variable names.
To skip variables not of interest that come before those that are, specify _
(to use throw-away variable $_
) or ''
; e.g., to extract filename root and extension only, use splitPath '/etc/bash.bashrc' _ _ fnameroot extension
.
# SYNOPSIS
# splitPath path varDirname [varBasename [varBasenameRoot [varSuffix]]]
# DESCRIPTION
# Splits the specified input path into its components and returns them by assigning
# them to variables with the specified *names*.
# Specify '' or throw-away variable _ to skip earlier variables, if necessary.
# The filename suffix, if any, always starts with '.' - only the *last*
# '.'-prefixed token is reported as the suffix.
# As with `dirname`, varDirname will report '.' (current dir) for input paths
# that are mere filenames, and '/' for the root dir.
# As with `dirname` and `basename`, a trailing '/' in the input path is ignored.
# A '.' as the very first char. of a filename is NOT considered the beginning
# of a filename suffix.
# EXAMPLE
# splitPath '/home/jdoe/readme.txt' parentpath fname fnameroot suffix
# echo "$parentpath" # -> '/home/jdoe'
# echo "$fname" # -> 'readme.txt'
# echo "$fnameroot" # -> 'readme'
# echo "$suffix" # -> '.txt'
# ---
# splitPath '/home/jdoe/readme.txt' _ _ fnameroot
# echo "$fnameroot" # -> 'readme'
splitPath() {
local _sp_dirname= _sp_basename= _sp_basename_root= _sp_suffix=
# simple argument validation
(( $# >= 2 )) || { echo "$FUNCNAME: ERROR: Specify an input path and at least 1 output variable name." >&2; exit 2; }
# extract dirname (parent path) and basename (filename)
_sp_dirname=$(dirname "$1")
_sp_basename=$(basename "$1")
# determine suffix, if any
_sp_suffix=$([[ $_sp_basename = *.* ]] && printf %s ".${_sp_basename##*.}" || printf '')
# determine basename root (filemane w/o suffix)
if [[ "$_sp_basename" == "$_sp_suffix" ]]; then # does filename start with '.'?
_sp_basename_root=$_sp_basename
_sp_suffix=''
else # strip suffix from filename
_sp_basename_root=${_sp_basename%$_sp_suffix}
fi
# assign to output vars.
[[ -n $2 ]] && printf -v "$2" "$_sp_dirname"
[[ -n $3 ]] && printf -v "$3" "$_sp_basename"
[[ -n $4 ]] && printf -v "$4" "$_sp_basename_root"
[[ -n $5 ]] && printf -v "$5" "$_sp_suffix"
return 0
}
test_paths=(
'/etc/bash.bashrc'
'/usr/bin/grep'
'/Users/jdoe/.bash_profile'
'/Library/Application Support/'
'readme.new.txt'
)
for p in "${test_paths[@]}"; do
echo ----- "$p"
parentpath= fname= fnameroot= suffix=
splitPath "$p" parentpath fname fnameroot suffix
for n in parentpath fname fnameroot suffix; do
echo "$n=${!n}"
done
done
Test code that exercises the function:
test_paths=(
'/etc/bash.bashrc'
'/usr/bin/grep'
'/Users/jdoe/.bash_profile'
'/Library/Application Support/'
'readme.new.txt'
)
for p in "${test_paths[@]}"; do
echo ----- "$p"
parentpath= fname= fnameroot= suffix=
splitPath "$p" parentpath fname fnameroot suffix
for n in parentpath fname fnameroot suffix; do
echo "$n=${!n}"
done
done
Expected output - note the edge cases:
.
(not considered the start of the suffix)/
(trailing /
is ignored).
is returned as the parent path).
-prefixed token (only the last is considered the suffix):----- /etc/bash.bashrc
parentpath=/etc
fname=bash.bashrc
fnameroot=bash
suffix=.bashrc
----- /usr/bin/grep
parentpath=/usr/bin
fname=grep
fnameroot=grep
suffix=
----- /Users/jdoe/.bash_profile
parentpath=/Users/jdoe
fname=.bash_profile
fnameroot=.bash_profile
suffix=
----- /Library/Application Support/
parentpath=/Library
fname=Application Support
fnameroot=Application Support
suffix=
----- readme.new.txt
parentpath=.
fname=readme.new.txt
fnameroot=readme.new
suffix=.txt
Upvotes: 29
Reputation: 1145
IMHO the best solution has already been given (using shell parameter expansion) and are the best rated one at this time.
I however add this one which just use dumbs commands, which is not efficient and which noone serious should use ever :
FILENAME=$(echo $FILE | cut -d . -f 1-$(printf $FILE | tr . '\n' | wc -l))
EXTENSION=$(echo $FILE | tr . '\n' | tail -1)
Added just for fun :-)
Upvotes: 3
Reputation: 473
Smallest and simplest solution (in single line) is:
$ file=/blaabla/bla/blah/foo.txt
echo $(basename ${file%.*}) # foo
Upvotes: 25
Reputation: 94829
First, get file name without the path:
filename=$(basename -- "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
Alternatively, you can focus on the last '/' of the path instead of the '.' which should work even if you have unpredictable file extensions:
filename="${fullfile##*/}"
You may want to check the documentation :
Upvotes: 4507
Reputation: 16706
Simply use ${parameter%word}
In your case:
${FILE%.*}
If you want to test it, all following work, and just remove the extension:
FILE=abc.xyz; echo ${FILE%.*};
FILE=123.abc.xyz; echo ${FILE%.*};
FILE=abc; echo ${FILE%.*};
Upvotes: 9
Reputation: 10892
If you also want to allow empty extensions, this is the shortest I could come up with:
echo 'hello.txt' | sed -r 's/.+\.(.+)|.*/\1/' # EXTENSION
echo 'hello.txt' | sed -r 's/(.+)\..+|(.*)/\1\2/' # FILENAME
1st line explained: It matches PATH.EXT or ANYTHING and replaces it with EXT. If ANYTHING was matched, the ext group is not captured.
Upvotes: 3
Reputation: 10974
No need to bother with awk
or sed
or even perl
for this simple task. There is a pure-Bash, os.path.splitext()
-compatible solution which only uses parameter expansions.
Documentation of os.path.splitext(path)
:
Split the pathname path into a pair
(root, ext)
such thatroot + ext == path
, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored;splitext('.cshrc')
returns('.cshrc', '')
.
Python code:
root, ext = os.path.splitext(path)
root="${path%.*}"
ext="${path#"$root"}"
root="${path#.}";root="${path%"$root"}${root%.*}"
ext="${path#"$root"}"
Here are test cases for the Ignoring leading periods implementation, which should match the Python reference implementation on every input.
|---------------|-----------|-------|
|path |root |ext |
|---------------|-----------|-------|
|' .txt' |' ' |'.txt' |
|' .txt.txt' |' .txt' |'.txt' |
|' txt' |' txt' |'' |
|'*.txt.txt' |'*.txt' |'.txt' |
|'.cshrc' |'.cshrc' |'' |
|'.txt' |'.txt' |'' |
|'?.txt.txt' |'?.txt' |'.txt' |
|'\n.txt.txt' |'\n.txt' |'.txt' |
|'\t.txt.txt' |'\t.txt' |'.txt' |
|'a b.txt.txt' |'a b.txt' |'.txt' |
|'a*b.txt.txt' |'a*b.txt' |'.txt' |
|'a?b.txt.txt' |'a?b.txt' |'.txt' |
|'a\nb.txt.txt' |'a\nb.txt' |'.txt' |
|'a\tb.txt.txt' |'a\tb.txt' |'.txt' |
|'txt' |'txt' |'' |
|'txt.pdf' |'txt' |'.pdf' |
|'txt.tar.gz' |'txt.tar' |'.gz' |
|'txt.txt' |'txt' |'.txt' |
|---------------|-----------|-------|
All tests passed.
Upvotes: 38
Reputation: 109
Ok so if I understand correctly, the problem here is how to get the name and the full extension of a file that has multiple extensions, e.g., stuff.tar.gz
.
This works for me:
fullfile="stuff.tar.gz"
fileExt=${fullfile#*.}
fileName=${fullfile%*.$fileExt}
This will give you stuff
as filename and .tar.gz
as extension. It works for any number of extensions, including 0. Hope this helps for anyone having the same problem =)
Upvotes: 10
Reputation: 183
Building from Petesh answer, if only the filename is needed, both path and extension can be stripped in a single line,
filename=$(basename ${fullname%.*})
Upvotes: 7
Reputation: 993
You can force cut to display all fields and subsequent ones adding -
to field number.
NAME=`basename "$FILE"`
EXTENSION=`echo "$NAME" | cut -d'.' -f2-`
So if FILE is eth0.pcap.gz
, the EXTENSION will be pcap.gz
Using the same logic, you can also fetch the file name using '-' with cut as follows :
NAME=`basename "$FILE" | cut -d'.' -f-1`
This works even for filenames that do not have any extension.
Upvotes: 14
Reputation: 507
A simple answer:
To expand on the POSIX variables answer, note that you can do more interesting patterns. So for the case detailed here, you could simply do this:
tar -zxvf $1
cd ${1%.tar.*}
That will cut off the last occurrence of .tar.<something>.
More generally, if you wanted to remove the last occurrence of .<something>.<something-else> then
${1.*.*}
should work fine.
The link the above answer appears to be dead. Here's a great explanation of a bunch of the string manipulation you can do directly in Bash, from TLDP.
Upvotes: 4
Reputation: 11185
From the answers above, the shortest oneliner to mimic Python's
file, ext = os.path.splitext(path)
presuming your file really does have an extension, is
EXT="${PATH##*.}"; FILE=$(basename "$PATH" .$EXT)
Upvotes: 1
Reputation: 70977
In addition to the lot of good answers on this Stack Overflow question I would like to add:
Under Linux and other unixen, there is a magic command named file
, that do filetype detection by analysing some first bytes of file. This is a very old tool, initialy used for print servers (if not created for... I'm not sure about that).
file myfile.txt
myfile.txt: UTF-8 Unicode text
file -b --mime-type myfile.txt
text/plain
Standards extensions could be found in /etc/mime.types
(on my Debian GNU/Linux desktop. See man file
and man mime.types
. Perhaps you have to install the file
utility and mime-support
packages):
grep $( file -b --mime-type myfile.txt ) </etc/mime.types
text/plain asc txt text pot brf srt
You could create a bash function for determining right extension. There is a little (not perfect) sample:
file2ext() {
local _mimetype=$(file -Lb --mime-type "$1") _line _basemimetype
case ${_mimetype##*[/.-]} in
gzip | bzip2 | xz | z )
_mimetype=${_mimetype##*[/.-]}
_mimetype=${_mimetype//ip}
_basemimetype=$(file -zLb --mime-type "$1")
;;
stream )
_mimetype=($(file -Lb "$1"))
[ "${_mimetype[1]}" = "compressed" ] &&
_basemimetype=$(file -b --mime-type - < <(
${_mimetype,,} -d <"$1")) ||
_basemimetype=${_mimetype,,}
_mimetype=${_mimetype,,}
;;
executable ) _mimetype='' _basemimetype='' ;;
dosexec ) _mimetype='' _basemimetype='exe' ;;
shellscript ) _mimetype='' _basemimetype='sh' ;;
* )
_basemimetype=$_mimetype
_mimetype=''
;;
esac
while read -a _line ;do
if [ "$_line" == "$_basemimetype" ] ;then
[ "$_line[1]" ] &&
_basemimetype=${_line[1]} ||
_basemimetype=${_basemimetype##*[/.-]}
break
fi
done </etc/mime.types
case ${_basemimetype##*[/.-]} in
executable ) _basemimetype='' ;;
shellscript ) _basemimetype='sh' ;;
dosexec ) _basemimetype='exe' ;;
* ) ;;
esac
[ "$_mimetype" ] && [ "$_basemimetype" != "$_mimetype" ] &&
printf ${2+-v} $2 "%s.%s" ${_basemimetype##*[/.-]} ${_mimetype##*[/.-]} ||
printf ${2+-v} $2 "%s" ${_basemimetype##*[/.-]}
}
This function could set a Bash variable that can be used later:
(This is inspired from @Petesh right answer):
filename=$(basename "$fullfile")
filename="${filename%.*}"
file2ext "$fullfile" extension
echo "$fullfile -> $filename . $extension"
Upvotes: 11
Reputation: 43421
Maybe there is an option in tar
to do this; did you check the man? Otherwise, you can use Bash string expansion:
test="mpc-1.0.1.tar.gz"
noExt="${test/.tar.gz/}" # Remove the string '.tar.gz'
echo $noExt
Upvotes: 0
Reputation: 77
Here is code with AWK. It can be done more simply. But I am not good in AWK.
filename$ ls
abc.a.txt a.b.c.txt pp-kk.txt
filename$ find . -type f | awk -F/ '{print $2}' | rev | awk -F"." '{$1="";print}' | rev | awk 'gsub(" ",".") ,sub(".$", "")'
abc.a
a.b.c
pp-kk
filename$ find . -type f | awk -F/ '{print $2}' | awk -F"." '{print $NF}'
txt
txt
txt
Upvotes: 6
Reputation: 197
I think that if you just need the name of the file, you can try this:
FULLPATH=/usr/share/X11/xorg.conf.d/50-synaptics.conf
# Remove all the prefix until the "/" character
FILENAME=${FULLPATH##*/}
# Remove all the prefix until the "." character
FILEEXTENSION=${FILENAME##*.}
# Remove a suffix, in our case, the filename. This will return the name of the directory that contains this file.
BASEDIRECTORY=${FULLPATH%$FILENAME}
echo "path = $FULLPATH"
echo "file name = $FILENAME"
echo "file extension = $FILEEXTENSION"
echo "base directory = $BASEDIRECTORY"
And that is all =D.
Upvotes: 18
Reputation: 467
Mellen writes in a comment on a blog post:
Using Bash, there’s also ${file%.*}
to get the filename without the extension and ${file##*.}
to get the extension alone. That is,
file="thisfile.txt"
echo "filename: ${file%.*}"
echo "extension: ${file##*.}"
Outputs:
filename: thisfile
extension: txt
Upvotes: 45
Reputation: 52778
Using example file /Users/Jonathan/Scripts/bash/MyScript.sh
, this code:
MY_EXT=".${0##*.}"
ME=$(/usr/bin/basename "${0}" "${MY_EXT}")
will result in ${ME}
being MyScript
and ${MY_EXT}
being .sh
:
Script:
#!/bin/bash
set -e
MY_EXT=".${0##*.}"
ME=$(/usr/bin/basename "${0}" "${MY_EXT}")
echo "${ME} - ${MY_EXT}"
Some tests:
$ ./MyScript.sh
MyScript - .sh
$ bash MyScript.sh
MyScript - .sh
$ /Users/Jonathan/Scripts/bash/MyScript.sh
MyScript - .sh
$ bash /Users/Jonathan/Scripts/bash/MyScript.sh
MyScript - .sh
Upvotes: 1