Reputation: 53

split file into multiple files (by columns)

I have a file data.txt in which there are 200 columns and rows (a square matrix). So, i have been trying to split my file into 200 files, each of then with one of the column from the big data file. These where my two attempts employing cut and awk, however i don't understand why is not working.

NM=`awk 'NR==1{print NF-2}' < file.txt`
echo $NM

for (( i=1; i = $NM; i++ ))
do
echo $i 
cut -f ${i} file.txt > tmpgrid_0${i}.dat
#awk '{print '$i'}'  file.txt > tmpgrid_0${i}.dat
done

Any suggestions?.

EDIT: Thank you very much to all of you. All answers were valid but i cannot vote to all of them.

Upvotes: 5

Answers (3)

Erik

Reputation: 7342

An alternative solution using tr and split

< file.txt tr ' ' '\n' | split -nr/200

This assumes that the file is space delimited, but the tr command could be tweaked as appropriate for any delimiter. Essentially this puts each entry on its own line, and then uses split's round robin version to write each 200th line to the same file.

paste -d' ' x* | cmp - file.txt

verifies that it worked if split is writing files with an x prefix.

I got this solution from Reuti on the coreutils mailing list.

Upvotes: 1

Vijay

Reputation: 67211

awk '{for(i=1;i<=5;i++){name=FILENAME"_"i;print $i> name}}' your_file

Tested with 5 columns:

> cat temp
PHE  5  2 4 6
PHE  5  4 6 4
PHE  5  4 2 8
TRP  7  5 5 9
TRP  7  5 7 1
TRP  7  5 7 3
TYR  2  4 4 4
TYR  2  4 4 0
TYR  2  4 5 3
> nawk '{for(i=1;i<=5;i++){name=FILENAME"_"i;print $i> name}}' temp
> ls -1  temp_*
temp_1
temp_2
temp_3
temp_4
temp_5
> cat temp_1
PHE
PHE
PHE
TRP
TRP
TRP
TYR
TYR
TYR
>

Upvotes: 7

Mark Setchell

Reputation: 207385

To summarise my comments, I suggest something like this (untested as I have no sample file):

NM=$(awk 'NR==1{print NF-2}' file.txt)
echo $NM

for (( i=1; i <= $NM; i++ ))
do
   echo $i 
   awk '{print $'$i'}'  file.txt > tmpgrid_0${i}.dat
done

Upvotes: 2

split file into multiple files (by columns)

Answers (3)

Related Questions