Achal Neupane
Achal Neupane

Reputation: 5719

How do I loop over multiple files to extract specific columns and save as separate files?

I have numerous *.txt files. I want to extract column 3 and 5 from each of these files and save them as new files keeping their oiginal names with new_ extension. I have this bash loop below in trying to do this, but doesn't do what I want. Can someone please help me with this?

for i in *.txt; do
cut -f 3,5 $i  > /media/owner/new_$i_assembly.txt 
done

Upvotes: 0

Views: 632

Answers (2)

PesaThe
PesaThe

Reputation: 7499

You have to make sure and tell Bash explicitly to expand variable $i, otherwise it picks up characters you don't want and expands variable $i_assembly instead:

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i}_assembly.txt" 
done

If you don't want the extension included in your new name, use parameter expansion ${i%.*} that removes everything up to the first . included, from the end.

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i%.*}_assembly.txt" 
done

If you decide for a different approach that might result in paths, not just filenames (for example: **/*.txt), you can use parameter expansion once again to get only the name of your file:

for i in **/*.txt; do
   base=${i##*/} 
   base=${base%.*}
   cut -f 3,5 "$i"  > "/media/owner/new_${base}_assembly.txt" 
done

Also note that TAB is the default delimiter for cut, you don't need to specify it with the -d option:

-d, --delimiter=DELIM
      use DELIM instead of TAB for field delimiter

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Simple approach:

for f in *.txt; do
    cut -d$'\t' -f3,5 "$f" > "/media/owner/new_${f}_assembly.txt" 
done

In case if there could be possible whitespace(s) except tabs - you may use the following awk approach:

for f in *.txt; do
    awk '{ print $3,$5 }' OFS='\t' "$f" > "/media/owner/new_${f}_assembly.txt" 
done

Upvotes: 3

Related Questions