Reputation: 45
Here'a short summary on what I'm trying to do:
Let's say I have a CSV stored as a $variable in shell. It looks like this:
account,index,quantity
100,AAPL,10
105,NFLX,25
110,TSLA,50
120,TWTR,45
Now I query a PSQL db from shell as such:
accounts=$(psql -d mydb -h mydb -f "SELECT account_num FROM accounts WHERE is_relevant")
Now I have a list of accounts, stored in a seemingly unstructured variable. Simply, I want to filter my original CSV for the values given in my new accounts query.
1) When I call echo on the variable storing my query result, I get a long string of output - just a concatenated list of all relevant accounts.
2) When I call head on the variable, each account number throws an error:
head: cannot open '100' for reading: No such file or directory
I see this, and I think, "shell is not recognizing these entries as strings to print, but rather commands to run" - and I'm not sure how to fix this. Attempts to use sed to place quotes or commas to delimit the strings have thrown similar errors - either regarding missing files or nonexistent commands.
While I suspect grep is ultimately the correct tool for this - I want to pose this open-endedly. How would you do this?
Edit: to clarify, given my original account table, if the PSQL query returns:
100
105
120
I want to filter the original table on these values, to obtain:
account,index,quantity
100,AAPL,10
105,NFLX,25
120,TWTR,45
(Row with account # 110 has been filtered out.)
Upvotes: 1
Views: 969
Reputation: 484
You may try this after the query:
# Create a filtered_variable to store the filtered results
# and add the first line from the original variable (the CSV header)
filtered_variable=$(echo "$variable" | head -n 1)
# For each account in the accounts obtained in the query
for account in $accounts
do
# Create a filtered_line variable to store the line where the account
# appears in the CSV, or an empty line if the account is not in the CSV
filtered_line=$(echo "$variable" | grep "^$account,")
# If $filtered_line is not empty (the account is in the CSV) ...
if [ ! -z "$filtered_line" ]
then
# ... add the line to the filtered_variable (filtered CSV)
filtered_variable+=$'\n'"$filtered_line"
fi
done
Now you have the filtered table in the variable filtered_variable
. If you want it in the original variable
, then just do variable="$filtered_variable"
after the loop.
Alternative solution
You can also use egrep
with a regex that includes all the accounts returned in the query. For instance
echo "$variable" | egrep -e "^100,|^110,"
will return
100,AAPL,10
110,TSLA,50
This regex looks for lines starting with 100,
or 110,
. I have added the ,
to avoid false possitive matches.
So all you need is to create that regex for all the accounts returned in the query. This can be done easily with sed
:
filter=$(echo "^$accounts," | sed -e 's/ /,|^/g')
Now you have your filter as a regex in the variable filter
, and all that remains is to do the egrep
:
filtered_variable=$(echo "$variable" | egrep "$filter")
And again you will have the filtered accounts in the auxiliar variable filtered_variable
(don't forget to add the CSV header line first).
Upvotes: 1