Deano
Deano

Reputation: 12200

sed / awk - remove space in file name

I'm trying to remove whitespace in file names and replace them.

Input:

echo "File Name1.xml File Name3 report.xml" | sed 's/[[:space:]]/__/g'

However the output

File__Name1.xml__File__Name3__report.xml

Desired output

File__Name1.xml File__Name3__report.xml

Upvotes: 1

Views: 1800

Answers (4)

gboffi
gboffi

Reputation: 25093

You named awk in the title of the question, didn't you?

$ echo "File Name1.xml File Name3 report.xml" | \
> awk -F'.xml *' '{for(i=1;i<=NF;i++){gsub(" ","_",$i); printf i<NF?$i ".xml ":"\n" }}'
File_Name1.xml File_Name3_report.xml
$
  • -F'.xml *' instructs awk to split on a regex, the requested extension plus 0 or more spaces
  • the loop {for(i=1;i<=NF;i++) is executed for all the fields in which the input line(s) is(are) splitted — note that the last field is void (it is what follows the last extension), but we are going to take that into account...
    the body of the loop
    • gsub(" ","_", $i) substitutes all the occurrences of space to underscores in the current field, as indexed by the loop variable i
    • printf i<NF?$i ".xml ":"\n" output different things, if i<NF it's a regular field, so we append the extension and a space, otherwise i equals NF, we just want to terminate the output line with a newline.

It's not perfect, it appends a space after the last filename. I hope that's good enough...


▶    A D D E N D U M    ◀

I'd like to address:

To reach these goals, I've decided to wrap the scriptlet in a shell function, that changing spaces into underscores is named s2u

$ s2u () { awk -F'\.'$1' *' -v ext=".$1" '{
> NF--;for(i=1;i<=NF;i++){gsub(" ","_",$i);printf "%s",$i ext (i<NF?" ":"\n")}}'
> }
$ echo "File Name1.xml File Name3 report.xml" | s2u xml
File_Name1.xml File_Name3_report.xml
$

It's a bit different (better?) 'cs it does not special print the last field but instead special-cases the delimiter appended to each field, but the idea of splitting on the extension remains.

Upvotes: 1

Ulysse BN
Ulysse BN

Reputation: 11413

You could use rename:

rename --nows *.xml

This will replace all the spaces of the xml files in the current folder with _.

Sometimes it comes without the --nows option, so you can then use a search and replace:

rename 's/[[:space:]]/__/g' *.xml

Eventually you can use --dry-run if you want to just print filenames without editing the names.

Upvotes: 0

Lizardx
Lizardx

Reputation: 1195

Assuming you are asking how to rename file names, and not remove spaces in a list of file names that are being used for some other reason, this is the long and short way. The long way uses sed. The short way uses rename. If you are not trying to rename files, your question is quite unclear and should be revised.

If the goal is to simply get a list of xml file names and change them with sed, the bottom example is how to do that.

directory contents:

ls -w 2
bob is over there.xml
fred is here.xml
greg is there.xml

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}";
done
shopt -u nullglob
# output
bob is over there.xml
fred is here.xml
greg is there.xml

# then rename them
cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   # I prefer 'rename' for such things
   # rename 's/[[:space:]]/_/g' "${a_glob[i]}";
   # but sed works, can't see any reason to use it for this purpose though
   mv "${a_glob[i]}" $(sed 's/[[:space:]]/_/g' <<< "${a_glob[i]}");
done
shopt -u nullglob

result:

ls -w 2
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

globbing is what you want here because of the spaces in the names.

However, this is really a complicated solution, when actually all you need to do is:

cd [your space containing directory]
rename 's/[[:space:]]/_/g' *.xml

and that's it, you're done.

If on the other hand you are trying to create a list of file names, you'd certainly want the globbing method, which if you just modify the statement, will do what you want there too, that is, just use sed to change the output file name.

If your goal is to change the filenames for output purposes, and not rename the actual files:

cd [directory with files]
shopt -s nullglob
a_glob=(*.xml);
for ((i=0;i< ${#a_glob[@]}; i++));do 
   echo "${a_glob[i]}" | sed 's/[[:space:]]/_/g';
done
shopt -u nullglob
# output:
bob_is_over_there.xml
fred_is_here.xml
greg_is_there.xml

Upvotes: 0

linden2015
linden2015

Reputation: 887

This seems a good start if the filenames aren't delineated:

((?:\S.*?)?\.\w{1,})\b

(        // start of captured group
(?:      // non-captured group
\S.*?    // a non-white-space character, then 0 or more any character
)?       // 0 or 1 times
\.       // a dot
\w{1,}   // 1 or more word characters
)        // end of captured group
\b       // a word boundary

You'll have to look-up how a PCRE pattern converts to a shell pattern. Alternatively it can be run from a Python/Perl/PHP script.

Demo

Upvotes: 0

Related Questions