Reputation:
I am using a Mac OS X Lion.
I have a folder: LITERATURE
with the following structure:
LITERATURE > Y > YATES, DORNFORD > THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
I want to recursively concatenate the chapters that are split into multiple files (not all are). Then, I want to write the concatenated file to its parent's parent directory. The name of the concatenated file should be the same as the name of its parent directory.
For example, after running the script (in the folder structure shown above) I should get the following.
LITERATURE > Y > YATES, DORNFORD:
THE BROTHER OF DAPHNE.txt
THE BROTHER OF DAPHNE:
Chapters 01-05.txt
Chapters 06-10.txt
Chapters 11-end.txt
In this example, the parent directory is THE BROTHER OF DAPHNE
and the parent's parent directory is YATES, DORNFORD
.
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
Upvotes: 0
Views: 400
Reputation:
Thanks for all your input. They got me thinking, and I managed to concatenate the files using the following steps:
#!/bin/bash
# We are going to iterate through the directory tree, up to a maximum depth of 20.
for i in `seq 1 20`
do
# In UNIX based systems, files and directories are the same (Everything is a File!).
# The 'find' command lists all files which contain spaces in its name. The | (pipe) …
# … forwards the list to a 'while' loop that iterates through each file in the list.
find . -name '* *' -maxdepth $i | while read file
do
# Here, we use 'sed' to replace spaces in the filename with underscores.
# The 'echo' prints a message to the console before renaming the file using 'mv'.
item=`echo "$file" | sed 's/ /_/g'`
echo "Renaming '$file' to '$item'"
mv "$file" "$item"
done
done
#!/bin/bash
# Here, we go through all the directories (up to a depth of 20).
for D in `find . -maxdepth 20 -type d`
do
# Check if the parent directory contains any files of interest.
if ls $D/Part*.txt &>/dev/null ||
ls $D/Chapter*.txt &>/dev/null ||
ls $D/Section*.txt &>/dev/null ||
ls $D/Book*.txt &>/dev/null
then
# If we get here, then there are split files in the directory; we will concatenate them.
# First, we trim the full directory path ($D) so that we are left with the path to the …
# … files' parent's parent directory—We will write the concatenated file here. (✝)
ppdir="$(dirname "$D")"
# Here, we concatenate the files using 'cat'. The 'awk' command extracts the name of …
# … the parent directory from the full directory path ($D) and gives us the filename.
# Finally, we write the concatenated file to its parent's parent directory. (✝)
cat $D/*.txt > $ppdir/`echo $D|awk -F'/' '$0=$(NF-0)'`.txt
fi
done
Now, we delete all the files that we concatenated so that its parent directory is left empty.
find . -name 'Part*' -delete
find . -name 'Chapter*' -delete
find . -name 'Section*' -delete
find . -name 'Book*' -delete
The following command will delete empty directories. (✝) We wrote the concatenated file to its parent's parent directory so that its parent directory is left empty after deleting all the split files.
find . -type d -empty -delete
[Updated March 6th—Rephrased the question/answer so that the question/answer is easy to find and understand.]
Upvotes: 0
Reputation: 107040
Shell doesn't like white space in names. However, over the years, Unix has come up with some tricks that'll help:
$ find . -name "Chapters*.txt" -type f -print0 | xargs -0 cat >> final_file.txt
Might do what you want.
The find
recursively finds all of the directory entries in a file tree that matches the query (In this case, the type must be a file, and the name matches the pattern Chapter*.txt
).
Normally, find
separates out the directory entry names with NL, but the -print0
says to separate out the entries names with the NUL
character. The NL
is a valid character in a file name, but NUL
isn't.
The xargs
command takes the output of the find
and processes it. xargs
gathers all the names and passes them in bulk to the command you give it -- in this case the cat
command.
Normally, xargs
separates out files by white space which means Chapters
would be one file and 01-05.txt
would be another. However, the -0
tells xargs
, to use NUL
as a file separator -- which is what -print0
does.
Upvotes: 0
Reputation: 10039
cat Chapters*.txt > FinaleFile.txt.raw
Chapters="$( ls -1 Chapters*.txt | sed -n 'H;${x;s/\
//g;s/ *Chapters //g;s/\.txt/ /g;s/ *$//p;}' )"
mv FinaleFile.txt.raw "FinaleFile ${Chapters}.txt"
Upvotes: 0
Reputation: 189447
It's not clear what you mean by "recursively" but this should be enough to get you started.
#!/bin/bash
titlecase () { # adapted from http://stackoverflow.com/a/6969886/874188
local arr
arr=("${@,,}")
echo "${arr[@]^}"
}
for book in LITERATURE/?/*/*; do
title=$(titlecase ${book##*/})
for file in "$book"/*; do
cat "$file"
echo
done >"$book/$title"
echo '# not doing this:' rm "$book"/*.txt
done
This loops over LITERATURE
/initial/author/BOOK TITLE and creates a file Book Title
(where should a space be added?) from the catenated files in each book directory. (I would generate it in the parent directory and then remove the book directory completely, assuming it contains nothing of value any longer.) There is no recursion, just a loop over this directory structure.
Removing the chapter files is a bit risky so I'm not doing it here. You could remove the echo
prefix from the line after the first done
to enable it.
If you have book names which contain an asterisk or some other shell metacharacter this will be rather more complex -- the title
assignment assumes you can use the book title unquoted.
Only the parameter expansion with case conversion is beyond the very basics of Bash. The array operations could perhaps also be a bit scary if you are a complete beginner. Proper understanding of quoting is also often a challenge for newcomers.
Upvotes: 1