Reputation: 1652

Extracting sub-strings in Unix

I'm using cygwin on Windows 7. I want to loop through a folder consisting of about 10,000 files and perform a signal processing tool's operation on each file. The problem is that the files names have some excess characters that are not compatible with the operation. Hence, I need to extract just a certain part of the file names.

For example if the file name is abc123456_justlike.txt.rna I need to use abc123456_justlike.txt. How should I write a loop to go through each file and perform the operation on the shortened file names?

I tried the cut - b1-10 command but that doesn't let my tool perform the necessary operation. I'd appreciate help with this problem

Upvotes: 0

Answers (3)

Kaz

Reputation: 58617

Try some shell scripting, using the ${NAME%TAIL} parameter substitution: the contents of variable NAME are expanded, but any suffix material which matches the TAIL glob pattern is chopped off.

$ NAME=abc12345.txt.rna
$ echo ${NAME%.rna}  #

# process all files in the directory, taking off their .rna suffix
$ for x in *; do signal_processing_tool ${x%.rna} ; done

If there are variations among the file names, you can classify them with a case:

for x in * ; do
  case $x in 
     *.rna ) 
        # do something with .rna files
        ;;
     *.txt )
        # do something else with .txt files
        ;;
     * )
        # default catch-all-else case
        ;;
  esac
done

Upvotes: 2

krlmlr

Reputation: 25474

Try sed:

echo a.b.c | sed 's/\.[^.]*$//'

The s command in sed performs a search-and-replace operation, in this case it replaces the regular expression \.[^.]*$ (meaning: a dot, followed by any number of non-dots, at the end of the string) with the empty string.

If you are not yet familiar with regular expressions, this is a good point to learn them. I find manipulating string using regular expressions much more straightforward than using tools like cut (or their equivalents).

Upvotes: 2

Teja

Reputation: 13534

If you are trying to extract the list of filenames from a directory use the below command.

ls -ltr | awk -F " " '{print $9}' | cut -c1-10

Upvotes: 0

Extracting sub-strings in Unix

Answers (3)

Related Questions