Kay Gee
Kay Gee

Reputation: 97

Edit a CSV column through another column

I have a Bash script which prompts for input in order to edit a CSV column:

echo X, as recorded in TIFF header, to be removed:
read X
sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
    s/,$X /,/" $pathToCSV".csv"

Now I want to avoid the manual, one-at-a-time inputting and instead use another column of the CSV (column C) which includes the value X to edit this first column. This way, I can batch generate these CSVs from TIFF headers and not have to input X (which differs from one CSV to the next) each time.

I looked around and tried a few things, including:

X=$(awk 'NR == 2 {print $3}' $pathToCSV".csv")
sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
    s/,$X /,/" $pathToCSV".csv"

This doesn't work and the output still includes X. (The column C includes the same value in all its rows, and I'm using C2 arbitrarily.)

Update 1:

As requested, I'm pasting part of my CSV here:

/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0001_a_1.tif   Si Ar 695 Front Board Outside   Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0002_a_1a.tif  Si Ar 695 Front Board Outside   Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0003_b_000.tif Si Ar 695 Front Board Inside    Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0009_b_003v.tif    Si Ar 695 Flyleaf 003v  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif    Si Ar 695 001r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0060_y_001r.tif    Si Ar 695 Flyleaf 001r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0070_y_999.tif Si Ar 695 Back Board Inside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0071_z_1.tif   Si Ar 695 Back Board Outside    Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0072_z_1a.tif  Si Ar 695 Back Board Outside    Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0073_z_2.tif   Si Ar 695 Spine Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0074_z_3.tif   Si Ar 695 Fore edge Si Ar 695

Instead of inputting "Si Ar 695" manually (and some other value for a different CSV), the third column should be used to remove the repeat value from the front of the second column.

Update 2:

The awk command below that @anubhava suggested works perfectly:

awk 'BEGIN{FS=OFS=","} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} 1' $pathToCSV".csv"

However, I just remembered the sed command I had previously, i.e.

sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
    s/,$X /,/" $pathToCSV".csv"

would prepend my second column with f. whenever the file name in the first column included a _f_. I think this should be a regex adjustment, but I'm struggling with implementing that in the awk command above.

This is the input:

/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif    Si Ar 695 001r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0011_f_001v.tif    Si Ar 695 001v  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0012_f_002r.tif    Si Ar 695 002r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0013_f_002v.tif    Si Ar 695 002v  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0014_f_003ar.tif   Si Ar 695 003r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0015_f_003av.tif   Si Ar 695 003v  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0016_f_004br.tif   Si Ar 695 004r  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0017_f_004bv.tif   Si Ar 695 004v  Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0018_f_005r.tif    Si Ar 695 005r  Si Ar 695

And this the desired output:

/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif    f. 001r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0011_f_001v.tif    f. 001v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0012_f_002r.tif    f. 002r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0013_f_002v.tif    f. 002v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0014_f_003ar.tif   f. 003r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0015_f_003av.tif   f. 003v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0016_f_004br.tif   f. 004r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0017_f_004bv.tif   f. 004v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0018_f_005r.tif    f. 005r Sinai Arabic 695

I would appreciate any help with Update 2!

Upvotes: 0

Views: 146

Answers (2)

anubhava
anubhava

Reputation: 784958

You may use this awk:

awk 'BEGIN{FS=OFS=","} NR==2{s=$2} NR>1{sub("^" s "[[:blank:]]+", "")} 1' file.csv

$2,$3
Front Board Outside,Si Ar 695
Front Board Inside,Si Ar 695
Flyleaf 001r,Si Ar 695
Flyleaf 001v,Si Ar 695

As per your pastebin link you may use this command:

awk 'BEGIN{FS=OFS="\t"} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} 1' file

Update 2:

As per your latest update, you will need to use following awk command:

awk 'BEGIN{FS=OFS=","} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} $1 ~ /_f_/{$2 = "f. " $2} 1' file.csv

Upvotes: 2

aborruso
aborruso

Reputation: 5678

Using your raw data (https://pastebin.com/V9Jg4icj) and Miller (https://github.com/johnkerl/miller) and running

mlr --nidx --fs tab cut -x -f 1 then put -S '$2=gsub($2,"^.+[0-9] ","")' input

You will have

Front Board Outside     Si Ar 695
Front Board Outside     Si Ar 695
Front Board Inside      Si Ar 695
Flyleaf 001r    Si Ar 695
Flyleaf 001v    Si Ar 695
Flyleaf 002r    Si Ar 695
Flyleaf 002v    Si Ar 695
Flyleaf 003r    Si Ar 695
Flyleaf 003v    Si Ar 695
001r    Si Ar 695
001v    Si Ar 695
002r    Si Ar 695
002v    Si Ar 695
003r    Si Ar 695
003v    Si Ar 695
004r    Si Ar 695
004v    Si Ar 695
005r    Si Ar 695
005v    Si Ar 695
006r    Si Ar 695
006v    Si Ar 695
007r    Si Ar 695
007v    Si Ar 695
008r    Si Ar 695
008v    Si Ar 695
009r    Si Ar 695
...    ...
...    ...

If you want a CSV output

mlr --n2c --ifs tab cut -x -f 1 then put -S '$2=gsub($2,"^.+[0-9] ","")' input

Upvotes: 1

Related Questions