astabada
astabada

Reputation: 1059

Awk: extract different columns from many different files

File Example

I have a 3-10 amount of files with:

 - different number of columns
 - same number of rows
 - inconsistent spacing (sometimes one space, other tabs, sometimes many spaces) **within** the very files like the below


>      0    55.4      9.556E+09   33
>      1     1.3      5.345E+03    1
>        ........
>     33   134.4      5.345E+04  932
>
       ........

I need to get column (say) 1 from file1, column 3 from file2, column 7 from file3 and column 1 from file4 and combine them into a single file, side by side.

Trial 1: not working

paste <(cut -d[see below] -f1 file1) <(cut -d[see below] -f3 file2) [...]

where the delimiter was ' ' or empty.

Trial 2: working with 2 files but not with many files

awk '{
     a1=$1;b1=$4;
     getline <"D2/file1.txt";
     print a1,$1,b1,$4
}' D1/file1.txt >D3/file1.txt

Now more general question:

How can I extract different columns from many different files?

Upvotes: 6

Views: 33289

Answers (3)

mouviciel
mouviciel

Reputation: 67929

In your paste / cut attempt, replace cut by awk:

$ paste <(awk '{print $1}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $7}' file3) <(awk '{print $1}' file4)

Upvotes: 21

Steve
Steve

Reputation: 54592

Assuming each of your files has the same number of rows, here's one way using GNU awk. Run like:

awk -f script.awk file1.txt file2.txt file3.txt file4.txt

Contents of script.awk:

FILENAME == ARGV[1] { one[FNR]=$1 }
FILENAME == ARGV[2] { two[FNR]=$3 }
FILENAME == ARGV[3] { three[FNR]=$7 }
FILENAME == ARGV[4] { four[FNR]=$1 }

END {
    for (i=1; i<=length(one); i++) {
        print one[i], two[i], three[i], four[i]
    }
}

Note:

By default, awk separates columns on whitespace. This includes tab characters and spaces, and any amount of these. This makes awk ideal for files with inconsistent spacing. You can also expand the above code to include more files if you wish.

Upvotes: 8

user647772
user647772

Reputation:

The combination of cut and paste should work:

$ cat f1
foo
bar
baz
$ cat f2
1 2 3
4 5 6
7 8 9
$ cat f3
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -d' ' -f3 f3)
foo 2 c
bar 5 g
baz 8 k

Edit: This works with tabs, too:

$ cat f4
a       b       c       d
e       f       g       h
i       j       k       l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -f3 f4)   
foo 2 c
bar 5 g
baz 8 k

Upvotes: 1

Related Questions