How do I move files into folders with similar names in Unix?

Question

I'm sorry if this question has been asked before, I just didn't know how to word it as a search query.

I have a set of folders that look like this:

Brain - Amygdala/                 Brain - Spinal cord (cervical c-1)/  Skin - Sun Exposed (Lower leg)/
Brain - Caudate (basal ganglia)/  Lung/                                Whole Blood/

I also have a set of files that look like this:

Brain_Amygdala.v7.covariates_output.txt                  Skin_Not_Sun_Exposed_Suprapubic.v7.covariates_output.txt
Brain_Caudate_basal_ganglia.v7.covariates_output.txt     Skin_Sun_Exposed_Lower_leg.v7.covariates_output.txt
Brain_Spinal_cord_cervical_c-1.v7.covariates_output.txt  Whole_Blood.v7.covariates_output.txt

As you can see, the files do not perfectly match up with the directories in their names. For example, Brain_Amygdala.v7.covariates_output.txt is not totally identical to Brain - Amygdala/. Even if we were to excise the tissue name from the covariates file, Brain_Amygdala is formatted differently from its corresponding folder.

Same with Whole Blood/. It is different from Whole_Blood.v7.covariates_output.txt, even if you were to isolate the tissue name from the covariates file Whole_Blood.

What I want to do, however, is to move each of these tissue files to their corresponding folder. If you notice, the covariate files are named after the tissue leading up to the first dot . in the file name. They are separated by underscores _. How I was thinking about approaching this was to break up the first few words leading up to the first . of the file name so that I can easily move it to its corresponding file.

e.g.

Brain_Amygdala.v7.covariates_output.txt -> Brain*Amygdala [mv]-> Brain*Amygdala/

a) I'm not sure how to isolate the first words of a file name leading up to the first . in a filename

b) if I were to do that, I don't know how to insert a wildcard in between each word and match that to the corresponding folder.

However, I am completely open to other ways of doing something like this.

gniourf_gniourf · Accepted Answer

Not a full answer, but it should address some of your concerns:

a) to isolate the first word of a string, leading up to the first .: use Parameter Expansions

string=Brain_Amygdala.v7.covariates_output.txt
until_dot=${string%%.*}
echo "$until_dot"

will output Brain_Amygdala (which we saved in the variable until_dot).

b) You may want to use the ${parameter/pattern/string} parameter expansion:

# Replace all non-alphabetic characters by the glob *
glob_pattern=${until_dot//[^[:alpha:]]/*}
echo "$glob_pattern"

will output (with the same variables as above) Brain*Amygdala

c) To use all of this: it's probably a good idea to determine the possible targets first, and do some basic checks:

# Use nullglob to have non matching glob expand to nothing
shopt -s nullglob
# DO NOT USE QUOTES IN THE FOLLOWING EXPANSION:
# the variable is actually a glob!
# Could also do dirs=( $glob_pattern*/ ) to check if directory
dirs=( $glob_pattern/ )

# Now check how many matches there are:
if ((${#dirs[@]} == 0)); then
    echo >&2 "No matches for $glob_pattern"
elif ((${#dirs[@]} > 1)); then
    echo >&2 "More than one matches for $glob_pattern: ${dirs[@]}"
else
    echo "All good!"
    # Remove the echo to actually perform the move
    echo mv "$string" "${dirs[0]}"
fi

I don't know how your data will effectively conform to these, but I hope this answer actually answers some of your questions! (and to learn more about parameter expansions, do read — and experiment with — the link to the reference I gave you).

How do I move files into folders with similar names in Unix?

Answers (1)

Related Questions