Bankey Biharidassa
Bankey Biharidassa

Reputation: 37

howto loop sed to get variable

I have a CSV file of 15000 rows. From the list I want to delete the unwanted products/manufacturers. I have a list with manufacturers and the source CSV file.

I found that sed would be appropiate but I'm hanging around the loop.

while read line
do
    unwanted = $
sed "|"$unwanted|d" /home/arno/pixtmp/pixtmp.csv >/home/arno/pixtmp/pix-clean.c$
done < /home/bankey/shopimport/unwanted.txt

Any help is appreciated.

Inputfile:

CONSUMABLES;Inktpatronen voor printer;Inkt voor printer;B0137790;HP;Pakket 2 inktpatronen No339 - Zwart + Papier Goodway - 80 g/m² - A4 - 500 vel;Dit pakket van 2 inktpatronen nr 339 zijn ontworpen voor uw HP printer en leveren afdrukken van kwaliteit.;47.19;6.99;47.19;http://pan8.fotovista.com/dev/8/5/32150358/l_32150358.jpg;in stock;0.2;0.11201;9.99;;C9504EE;0;;

Upvotes: 1

Views: 369

Answers (3)

Jonathan Leffler
Jonathan Leffler

Reputation: 753725

I'd use sed in two steps:

  1. Create the sed script from the unwanted information.
  2. Apply the created script to the data file.

That might be:

unwanted=/home/bankey/shopimport/unwanted.txt
datafile=/home/arno/pixtmp/pixtmp.csv
cleaned=/home/arno/pixtmp/pix-clean.csv

sed 's%.*%/,&,/d%' $unwanted > sed.script
sed -f sed.script  $datafile > $cleaned

rm -f sed.script

The first invocation of sed simply replace the contents of each line describing unwanted records with a sed command that will delete it as a comma-separated field in the middle of an data line. If you have to handle unwanted fields at the beginning or the end too, then you have to work harder. You also have to work harder if there might be embedded slashes, commas, quotes etc. The second invocation of sed applies the script created by the first to the data file, generating the cleaned file.

You can improve it by ensuring the script file name is unique, and by trapping the script file if the process is interrupted:

tmp=$(mktemp /tmp/script.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15 # EXIT, HUP, INT, QUIT, PIPE, TERM

unwanted=/home/bankey/shopimport/unwanted.txt
datafile=/home/arno/pixtmp/pixtmp.csv
cleaned=/home/arno/pixtmp/pix-clean.csv

sed 's%.*%/,&,/d%' $unwanted > $tmp
sed -f $tmp $datafile > $cleaned

rm -f $tmp
trap 0  # Cancel the exit trap

With GNU sed, but not with Mac OS X (BSD) sed, you could avoid the intermediate file thus:

unwanted=/home/bankey/shopimport/unwanted.txt
datafile=/home/arno/pixtmp/pixtmp.csv
cleaned=/home/arno/pixtmp/pix-clean.csv

sed 's%.*%/,&,/d%' $unwanted |
sed -f - $datafile > $cleaned

This tells the second sed to read its script from standard input. If you have bash version 4.x (not standard on Mac OS X), you could use process substitution instead:

unwanted=/home/bankey/shopimport/unwanted.txt
datafile=/home/arno/pixtmp/pixtmp.csv
cleaned=/home/arno/pixtmp/pix-clean.csv

sed -f <(sed 's%.*%/,&,/d%' $unwanted) $datafile > $cleaned

Upvotes: 1

William Pursell
William Pursell

Reputation: 212248

sed is less suited than awk. For example, assuming your input file and your list of undesired terms are space delimited, you could simply do:

awk 'NR==FNR { a[$0]++ } NR != FNR && !a[$1]' undesired input

This will print out the file 'input' file, omitting any line in which the first column matches a line in the file undesired.

Upvotes: 0

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200283

You have to make sure that each loop cycle takes the output file from the previous cycle as the input file, otherwise you'll keep overwriting the output file with the content of the original file minus the last unwanted record.

If your sed command supports inline editing (option -i) you can do this:

cp /home/arno/pixtmp/pixtmp.csv /home/arno/pixtmp/pix-clean.csv
while read line; do
  sed -i "/$line/d" /home/arno/pixtmp/pix-clean.csv
done < /home/bankey/shopimport/unwanted.txt

Otherwise you have to handle the temporary file yourself:

cp /home/arno/pixtmp/pixtmp.csv /home/arno/pixtmp/pix-clean.csv
while read line; do
  sed "/$line/d" /home/arno/pixtmp/pix-clean.csv >/home/arno/pixtmp/pix-clean.c$
  mv -f /home/arno/pixtmp/pix-clean.c$ /home/arno/pixtmp/pix-clean.csv
done < /home/bankey/shopimport/unwanted.txt

Upvotes: 0

Related Questions