Adrian Mak
Adrian Mak

Reputation: 157

How to use a csv file as input for basic arithmetic operations in bash

I've stored my data in neckrev_dim.csv file, structured like the following

subjectID,dim3,pixdim3 
MR44825,405,0.625

I also have a seperate subjects.csv, just containing all the subjectIDs

MR44825
MR55843

Now I want to use this data in basic arithmetic operations using bash.

subjlist=subjects.csv
for subj in ` cat $subjlist `
do
    dim3=$(grep -w '$subj' neckrev_dim.csv | cut -d ',' -f 2)
    pixdim3=$(grep -w '$subj' neckrev_dim.csv | cut -d ',' -f 3)
    total_length=$(($dim3*$pixdim3))
    echo $total_length
done

This leads to the following error:

syntax error: operand expected (error token is "*")

I think the problem lies within the grep, but I can't figure it out.

Thanks in advance!

Upvotes: 1

Views: 131

Answers (3)

Ian D. Allen
Ian D. Allen

Reputation: 221

The solution below is designed to work accurately and more generally for different types of key values and different CSV lines, avoiding some of the limitations and failure modes of the other solutions.

Description of the code

Using single key fields read one per line from file keys.txt, search for the key in the first field in a CSV file generic.csv and do some floating-point (non-integer) math on the numbers in the other fields.

Performance enhancements:

  1. If $key selects a unique row in the file, change XexitX below to exit so that awk doesn't keep reading the rest of the file unnecessarily; otherwise, delete XexitX and it will do all the lines matching that key.
  2. If generic.csv is a large file, then sort it and replace the awk line with the look --binary line. This will replace a linear search with a binary search. Make sure you sort the whole file:
    sort -o generic.csv generic.csv

Limitations:

  1. The $key key must not contain backslashes or double quotes in the awk version. This could be fixed using sed -e 's/\\/&&/g' -e 's/"/\\"/g' on the field. The look --binary version doesn't care.
  2. The generic.csv file must use commas only, no "quoted" CSV fields. This means no fields may contain commas.
  3. The look --binary version does key prefix matching on the CSV lines, so you can't have a key that is a prefix of another, e.g. keys ABC and AB aren't distinct. The awk version doesn't have this problem.

Advantages of this over other solutions:

  1. Reads the CSV only once per key, not multiple times.
  2. The $key is matched exactly on the first field and not on any fields that might be added to the rest of the CSV line - no false matches. (The look --binary version does do prefix matching, so you can't have a key that is a prefix of another.)
  3. The key field is a text field, not a regular expression, so it may contain special characters without any need to worry about escaping regular expression metacharacters to avoid errors.
  4. No need to use grep or cut to separate fields; only one pipe, not three.
  5. Can easily scale up to huge CSV files by using look --binary instead of awk.
while read -r key ; do
    # SEE NOTES: look --binary "$key" generic.csv \
    awk -F, "\$1 == \"$key\" { print ; XexitX }" generic.csv \
        | while IFS=, read -r key num1 num2  ; do
            echo "$key: $(dc -e "$num1 $num2 * p")"
        done
done <keys.txt

Upvotes: 1

costaparas
costaparas

Reputation: 5237

The main issue is that POSIX arithmetic does not support decimals, only integers.

You will have to use something else, like bc for non-integer arithmetic.

The other issue is that you are single-quoting $subj -- you should use double quotes so the variable gets expanded.

Try the following:

subjlist=subjects.csv

while read -r subj
do
    dim3=$(grep -w "$subj" neckrev_dim.csv | cut -d ',' -f 2)
    pixdim3=$(grep -w "$subj" neckrev_dim.csv | cut -d ',' -f 3)
    echo "$dim3 * $pixdim3" | bc
done < "$subjlist"

Note, here bc is reading from standard input, so we just need to echo the arithmetic expression to bc.

Upvotes: 1

Kris
Kris

Reputation: 21

You need to change the single quotes to double quotes around the $subj. Single quotes won't expand the variable.

Upvotes: 1

Related Questions