extract a list of data from multiple files

Question

I would like to ask help on this. Thank you very much!

I have thousands of files, each containing 5 columns and the first column containing names.

$ cat file1   
name math eng hist sci    
Kyle 56 45 68 97    
Angela 88 86 59 30    
June 48 87 85 98

I also have a file containing a list of names that can be found in the 5-column files.

$ cat list.txt    
June    
Isa    
Angela    
Manny

Specifically, I want to extract, say, the data in the 3rd column corresponding to the list file that I have in a structured way; columns representing the thousands of files and the names as rows. If one name in the list file is not present in a 5-column file, it should be presented as 0. Additionally, columns should headed with the filenames.

$ cat output.txt    
names file1 file2 file3 file4    
June 87 65 67 87    
Isa 0 0 0 54    
Angela 86 75 78 78
Manny 39 46 0 38

James Brown · Accepted Answer

Using your test files list.txt and file1 (twice) for testing. First the awk:

$ cat program.awk
function isEmpty(arr, idx) {     # using @EdMorton's test for array emptiness
    for (idx in arr)             # for figuring out the first data file
        return 0                 # https://stackoverflow.com/a/20078022/4162356
    return 1
}
function add(n,a) {              # appending grades for the chosen ones
    if(!isEmpty(a)) {            # if a is not empty
        for(i in n)              # iterate thru all chosen ones
            n[i]=n[i] (n[i]==""?"":OFS) (i in a?a[i]:0)  # and append
    }
}
FNR==1 {                         # for each new file 
    h=h (h==""?"":OFS) FILENAME  # build header
    process(n,a)                 # and process the previous file in hash a
}
NR==FNR {                        # chosen ones to hash n
    n[$1]
    next
}
$1 in n {                        # add chosen ones to a
    a[$1]=$3                     #
}
END {
    process(n,a)                 # in the end
    print h                      # print the header
    for(i in n)                  # and names with grades
        print i,n[i]
}

Running it:

$ awk -f program.awk list.txt file1 file1
list.txt file1 file1
Manny 0 0
Isa 0 0
Angela 86 86
June 87 87

extract a list of data from multiple files

Answers (2)

Related Questions