nreminder
nreminder

Reputation: 45

How to grep a shell variable for values present in psql output?

Here'a short summary on what I'm trying to do:

Let's say I have a CSV stored as a $variable in shell. It looks like this:

account,index,quantity
100,AAPL,10
105,NFLX,25
110,TSLA,50
120,TWTR,45

Now I query a PSQL db from shell as such:

accounts=$(psql -d mydb -h mydb -f "SELECT account_num FROM accounts WHERE is_relevant")

Now I have a list of accounts, stored in a seemingly unstructured variable. Simply, I want to filter my original CSV for the values given in my new accounts query.

1) When I call echo on the variable storing my query result, I get a long string of output - just a concatenated list of all relevant accounts.

2) When I call head on the variable, each account number throws an error:

head: cannot open '100' for reading: No such file or directory

I see this, and I think, "shell is not recognizing these entries as strings to print, but rather commands to run" - and I'm not sure how to fix this. Attempts to use sed to place quotes or commas to delimit the strings have thrown similar errors - either regarding missing files or nonexistent commands.

While I suspect grep is ultimately the correct tool for this - I want to pose this open-endedly. How would you do this?

Edit: to clarify, given my original account table, if the PSQL query returns:

100
105
120

I want to filter the original table on these values, to obtain:

account,index,quantity
100,AAPL,10
105,NFLX,25
120,TWTR,45

(Row with account # 110 has been filtered out.)

Upvotes: 1

Views: 969

Answers (1)

You may try this after the query:

# Create a filtered_variable to store the filtered results
# and add the first line from the original variable (the CSV header)
filtered_variable=$(echo "$variable" | head -n 1)

# For each account in the accounts obtained in the query
for account in $accounts
do
    # Create a filtered_line variable to store the line where the account
    # appears in the CSV, or an empty line if the account is not in the CSV
    filtered_line=$(echo "$variable" | grep "^$account,")

    # If $filtered_line is not empty (the account is in the CSV) ...
    if [ ! -z "$filtered_line" ]
    then
        # ... add the line to the filtered_variable (filtered CSV)
        filtered_variable+=$'\n'"$filtered_line"
    fi
done

Now you have the filtered table in the variable filtered_variable. If you want it in the original variable, then just do variable="$filtered_variable" after the loop.

Alternative solution

You can also use egrep with a regex that includes all the accounts returned in the query. For instance

echo "$variable" | egrep -e "^100,|^110,"

will return

100,AAPL,10
110,TSLA,50

This regex looks for lines starting with 100, or 110,. I have added the , to avoid false possitive matches.

So all you need is to create that regex for all the accounts returned in the query. This can be done easily with sed:

filter=$(echo "^$accounts," | sed -e 's/ /,|^/g')

Now you have your filter as a regex in the variable filter, and all that remains is to do the egrep:

filtered_variable=$(echo "$variable" | egrep "$filter")

And again you will have the filtered accounts in the auxiliar variable filtered_variable (don't forget to add the CSV header line first).

Upvotes: 1

Related Questions