Reputation: 97
I have a Bash script which prompts for input in order to edit a CSV column:
echo X, as recorded in TIFF header, to be removed:
read X
sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
s/,$X /,/" $pathToCSV".csv"
Now I want to avoid the manual, one-at-a-time inputting and instead use another column of the CSV (column C) which includes the value X
to edit this first column. This way, I can batch generate these CSVs from TIFF headers and not have to input X
(which differs from one CSV to the next) each time.
I looked around and tried a few things, including:
X=$(awk 'NR == 2 {print $3}' $pathToCSV".csv")
sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
s/,$X /,/" $pathToCSV".csv"
This doesn't work and the output still includes X
. (The column C includes the same value in all its rows, and I'm using C2 arbitrarily.)
Update 1:
As requested, I'm pasting part of my CSV here:
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0001_a_1.tif Si Ar 695 Front Board Outside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0002_a_1a.tif Si Ar 695 Front Board Outside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0003_b_000.tif Si Ar 695 Front Board Inside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0009_b_003v.tif Si Ar 695 Flyleaf 003v Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif Si Ar 695 001r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0060_y_001r.tif Si Ar 695 Flyleaf 001r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0070_y_999.tif Si Ar 695 Back Board Inside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0071_z_1.tif Si Ar 695 Back Board Outside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0072_z_1a.tif Si Ar 695 Back Board Outside Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0073_z_2.tif Si Ar 695 Spine Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0074_z_3.tif Si Ar 695 Fore edge Si Ar 695
Instead of inputting "Si Ar 695" manually (and some other value for a different CSV), the third column should be used to remove the repeat value from the front of the second column.
Update 2:
The awk
command below that @anubhava suggested works perfectly:
awk 'BEGIN{FS=OFS=","} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} 1' $pathToCSV".csv"
However, I just remembered the sed
command I had previously, i.e.
sed -i "" "/^[^,]*_f_[^,_]*,/s/,$X /,f. /
s/,$X /,/" $pathToCSV".csv"
would prepend my second column with f.
whenever the file name in the first column included a _f_
. I think this should be a regex adjustment, but I'm struggling with implementing that in the awk
command above.
This is the input:
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif Si Ar 695 001r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0011_f_001v.tif Si Ar 695 001v Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0012_f_002r.tif Si Ar 695 002r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0013_f_002v.tif Si Ar 695 002v Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0014_f_003ar.tif Si Ar 695 003r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0015_f_003av.tif Si Ar 695 003v Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0016_f_004br.tif Si Ar 695 004r Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0017_f_004bv.tif Si Ar 695 004v Si Ar 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0018_f_005r.tif Si Ar 695 005r Si Ar 695
And this the desired output:
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0010_f_001r.tif f. 001r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0011_f_001v.tif f. 001v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0012_f_002r.tif f. 002r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0013_f_002v.tif f. 002v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0014_f_003ar.tif f. 003r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0015_f_003av.tif f. 003v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0016_f_004br.tif f. 004r Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0017_f_004bv.tif f. 004v Sinai Arabic 695
/Volumes/Masters/DLTempSecure/EMEL_SLDP/201808/tiffs/arabic_0695/sld_arb0695_0018_f_005r.tif f. 005r Sinai Arabic 695
I would appreciate any help with Update 2!
Upvotes: 0
Views: 146
Reputation: 784958
You may use this awk
:
awk 'BEGIN{FS=OFS=","} NR==2{s=$2} NR>1{sub("^" s "[[:blank:]]+", "")} 1' file.csv
$2,$3
Front Board Outside,Si Ar 695
Front Board Inside,Si Ar 695
Flyleaf 001r,Si Ar 695
Flyleaf 001v,Si Ar 695
As per your pastebin link you may use this command:
awk 'BEGIN{FS=OFS="\t"} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} 1' file
Update 2:
As per your latest update, you will need to use following awk command:
awk 'BEGIN{FS=OFS=","} NF>2 && NR==1{s=$3} {sub("^" s "[[:blank:]]+", "", $2)} $1 ~ /_f_/{$2 = "f. " $2} 1' file.csv
Upvotes: 2
Reputation: 5678
Using your raw data (https://pastebin.com/V9Jg4icj) and Miller (https://github.com/johnkerl/miller) and running
mlr --nidx --fs tab cut -x -f 1 then put -S '$2=gsub($2,"^.+[0-9] ","")' input
You will have
Front Board Outside Si Ar 695
Front Board Outside Si Ar 695
Front Board Inside Si Ar 695
Flyleaf 001r Si Ar 695
Flyleaf 001v Si Ar 695
Flyleaf 002r Si Ar 695
Flyleaf 002v Si Ar 695
Flyleaf 003r Si Ar 695
Flyleaf 003v Si Ar 695
001r Si Ar 695
001v Si Ar 695
002r Si Ar 695
002v Si Ar 695
003r Si Ar 695
003v Si Ar 695
004r Si Ar 695
004v Si Ar 695
005r Si Ar 695
005v Si Ar 695
006r Si Ar 695
006v Si Ar 695
007r Si Ar 695
007v Si Ar 695
008r Si Ar 695
008v Si Ar 695
009r Si Ar 695
... ...
... ...
If you want a CSV output
mlr --n2c --ifs tab cut -x -f 1 then put -S '$2=gsub($2,"^.+[0-9] ","")' input
Upvotes: 1