Reputation: 23
I have thousands of writings in plain text format moved to a single directory.
In the titles, some have spaces, some start with -, some have single/double quotes, & basically every other valid Windows & Linux filename character is in the titles.
The content text contains Windows & Linux line endings(right - that's what they're called?).
In Linux/Bash, how do I concatenate all these files ((half are extension-less, half are .txt's)) into one file, sorted by modification date, with filename & file date neatly printed before each file's content?
If you could, please tell me how to do the same thing in a nested file structure, too, this time with the file paths printed for each file, besides filename & file modification date.
I would appreciate this greatly, this is for years of my very own writing, & I've been searching & struggling for a few hours now. I'm a writer not a programmer =)
Thanks for considering.
Upvotes: 1
Views: 600
Reputation: 46883
If you have some GNU goodies and dos2unix
:
find -type f -printf "%T@ %p\0" | sort -nz | while IFS= read -r -d '' l; do f=${l#* }; printf '%s %s\n' "$(date -r "$f")" "$f"; dos2unix < "$f"; echo; done
Should do the job and be 100% safe regarding all the funny filenames you might have. Works recursively. Sorry for the long one-liner but it's bedtime!
Edit.
Regarding your .fuse_hidden_blahblah
file: I have no idea why this file is here, why some content is recursively being added to itself. I'm sure you can safely ignore it by asking find
to explicitly ignore it:
find \! -name '.fuse_hidden*' -type f -printf "%T@ %p\0" | sort -nz | while IFS= read -r -d '' l; do f=${l#* }; printf '%s %s\n' "$(date -r "$f")" "$f"; dos2unix < "$f"; echo; done
By the way, the content is displayed on the terminal screen. If you want to redirect it into a file mycatedfile.txt
, then:
find \! -name 'mycatedfile.txt' \! -name '.fuse_hidden*' -type f -printf "%T@ %p\0" | sort -nz | while IFS= read -r -d '' l; do f=${l#* }; printf '%s %s\n' "$(date -r "$f")" "$f"; dos2unix < "$f"; echo; done > "mycatedfile.txt"
Upvotes: 1
Reputation: 81012
Using this fantastic answer (to avoid things like parsing ls output) gets you something like this (for a single directory):
sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[@]}" | sort -z)
for file in "${sorted[@]}"; do
if [ -f "$file" ]; then
echo "$file";
cat "$file";
fi
done; > Output.txt
For a nested hierarchy use for file in **; do
in shells that support that (bash version 4+ and zsh that I'm aware of) or put the above into a function and call it recursively on directories in the loop (below code entirely untested).
catall() {
declare sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
declare sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[@]}" | sort -z)
for file in "${sorted[@]}"; do
if [ -f "$file" ]; then
echo "$file";
cat "$file";
elif [ -d "$file" ]; then
catall "$file"
fi
done;
}
$ catall > Output.txt
Edit: As noticed in gniourf_gniourf's excellent answer I failed to account for the varied line endings in your input files. Using dos2unix <"$file"
instead of cat "$file"
in the above should normalize as was indicated.
Edit again: Hm... just noticed that this doesn't include the modification times in the output. The simplest way to get that into the output is also the costliest (fetch it again at output time) but a solution like what is employed in gniourf_gniourf's answer will work here as well (drop the sorthelper
to sorted
loop and use the timestamp in the final loop to write it to the file).
Upvotes: 0