tenraek
tenraek

Reputation: 13

how create directory from multiple parts of file name

We have recently exported patient records from our old EMR system, trouble is every note for every patient came out as it's own PDF file resulting in 876,000+ PDFs in one directory, all with a long, cumbersome file name format of ID#-record#.YYYY-MM-DD HH.MM.SS.FIRSTNAME LASTNAME.TYPE OF NOTE.pdf

My first goal is to get to get all the files into patient directories labeled by ID# FIRSTNAME LASTNAME

ie: for the file labeled

345-1.2011-02-3 08.59.53.JOHN DOE.General Miscellaneous Service.pdf 

a directory called 345-JOHN DOE would be created and any files that start with 345 would be put into it.

I know I can use a script like

for file in ./*_???ILN*; do
    dir=${file%ILN*}
    dir=${dir##*_}
    mkdir -p "./$dir" &&
    mv -iv "$file" "./$dir"
done

Which in this example would take the value between the _ and ILN and create a directory on just that value. But how, if possible, can I take the ID# value and the FIRSTNAME LASTNAME value to create a directory?

Upvotes: 1

Views: 108

Answers (1)

melpomene
melpomene

Reputation: 85767

You could use a regex like this:

for i in *.pdf; do
    if [[ "$i" =~ ^([0-9]+)-[0-9]+\.[0-9]{4}-[0-9]{2}-[0-9]{1,2}\ [0-9]{2}\.[0-9]{2}\.[0-9]{2}\.([^.]+)\. ]]; then
        id="${BASH_REMATCH[1]}"
        name="${BASH_REMATCH[2]}"
        subdir="$id-$name"
        mkdir -p -- "$subdir"
        mv -- "$i" "$subdir"
    else
        echo "couldn't parse file name: $i" >&2
    fi
done

Bash (since version 3) supports the =~ (regex match) operator in [[ ]], which places the substrings captured by ( ) groups in the BASH_REMATCH array. This is very convenient for extracting information from formatted strings.

Note that this will effectively group files by their ID/name combination, not just ID. This means if you have files that have the same ID, but a different name, they will be put in different subdirectories.

Upvotes: 1

Related Questions