user1723196
user1723196

Reputation: 125

Using awk to process a database

I have a directory on my computer which contains an entire database I found online for my research. This database contains thousands of files, so to do what I need I've been looking into file i/o stuff. A programmer friend suggested using bash/awk. I've written my code:

    #!/usr/bin/env awk
    ls -l|awk'
    BEGIN {print "Now running"}
    {if(NR == 17 / $1 >= 0.4 / $1 <= 2.5)
    {print $1 > wavelengths.txt;
    print $2 > reflectance.txt;
    print $3 > standardDev.txt;}}END{print "done"}'

When I put this into my console, I'm already in the directory of the files I need to access. The data I need begins on line 17 of EVERY file. The data looks like this:

some number    some number    some number
some number    some number    some number
    .              .              .
    .              .              .
    .              .              .

I want to access the data when the first column has a value of 0.4 (or approximately) and get the information up until the first column has a value of approximately 2.5. The first column represents wavelengths. I want to verify they are all the same for each file later, so I copy them into a file. The second column represents reflectance and I want this to be a separate file because later I'll take this information and build a data matrix from it. And the third column is the standard deviation of the reflectance.

The problem I am having now is that when I run this code, I get the following error: No such file or directory

Please, if anyone can tell me why I might be getting this error, or can guide me as to how to write the code for what I am trying to do... I will be so grateful.

Upvotes: 3

Views: 1405

Answers (2)

Steve
Steve

Reputation: 54402

Excellent attempt, but this is because you should never parse the output of ls. Still, you were probably looking for ls -1, not ls -l. awk can also accept a glob of files. For example, in the desired directory, you can run:

awk -f /path/to/script.awk *

Contents of script.awk:

BEGIN {
    print "Now running"
}

NR == 17 && $1 >= 0.4 && $1 <= 2.5 {

    print $1 > "wavelengths.txt"
    print $2 > "reflectance.txt"
    print $3 > "standardDev.txt"
}

END {
    print "Done"
}

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203532

The main problem is that you need to quote the names of the output file names as they are strings not variables. Use:

print $1 > "wavelengths.txt"

instead of:

print $1 > wavelengths.txt

Upvotes: 3

Related Questions