Reputation: 139
I have a file inventory.txt
that contains hundreds of lines. It lists data relevant to Customer IDs/Names, Inventory IDs/Names, and Product IDs/Names. The general setup of the file that on any given line a customerId=123
may appear. Following this line, an inventoryId=abc
line will appear. This file looks something like this:
<> START OF FILE
Customer ID=9000, Customer Name=Acme, Inc
Inventory ID=INV_ID1, Inventory Name=Acme_INV1
Product ID=100, Product Name=Banana
Product ID=200, Product Name=Apple
Inventory ID=INV_ID2, Inventory Name=Acme_INV2
Product ID=100, Product Name=Banana
Product ID=300, Product Name=Kiwi
Customer ID=7500, Customer Name=Anvil, Corp
Inventory ID=INV_ID3, Inventory Name=Anvil_INV1
Product ID=200, Product Name=Apple
<> END OF FILE
What I would like to do using SED, or any alternative that works well enough, is to create a CSV formatted file that has a single line of data for each customer/inventory combination that includes just the Customer ID/Name and Inventory ID/Name fields. So the output would look something like:
"9000", "Acme, Inc.", "INV_ID1", "Acme_INV1"
"9000", "Acme, Inc.", "INV_ID2", "Acme_INV2"
"7500", "Anvil, Inc.", "INV_ID3", "Anvil_INV1"
I understand how to use SED to format that input data into a CSV file output with commas and quotations, but I am having trouble in figuring out how to force the Customer ID
and Customer Name
to repeat at the beginning of every Inventory ID
and Inventory Name
line.
Upvotes: 1
Views: 1046
Reputation: 247022
Using a gawk extension to the match()
function
gawk '
match($0, /^Customer ID=([^,]+), Customer Name=(.*)/, cust) {
c_id=cust[1]; c_name=cust[2]
next
}
match($0, /^Inventory ID=([^,]+), Inventory Name=(.*)/, inv) {
printf "\"%s\",\"%s\",\"%s\",\"%s\"\n", c_id, c_name, inv[1], inv[2]
}
' filename
outputs
"9000","Acme, Inc","INV_ID1","Acme_INV1"
"9000","Acme, Inc","INV_ID2","Acme_INV2"
"7500","Anvil, Corp","INV_ID3","Anvil_INV1"
Upvotes: 0
Reputation: 54502
Here's one way using awk
:
awk -F= '{ sub(/,.*/,"",$2) } /^Customer ID/ { r = $2 OFS $3 } /^Inventory ID/ { print "\"" r, $2, $3 "\"" }' OFS="\", \"" inventory.txt
Or a sed
solution:
sed -n '/^Customer ID/ h; /^Inventory ID/ { G; s/.*=\([^,]*\).*=\([^\n]*\).*=\([^,]*\).*=\(.*\)/"\3", "\4", "\1", "\2"/; p }' inventory.txt
Results:
"9000", "Acme, Inc", "INV_ID1", "Acme_INV1"
"9000", "Acme, Inc", "INV_ID2", "Acme_INV2"
"7500", "Anvil, Corp", "INV_ID3", "Anvil_INV1"
awk
explanation:
OFS="\", \"" # set the output field separator to: ", "
-F= # split the line into three fields using the '=' character
{ sub(/,.*/,"",$2) } # one each line of input, remove everything trailing a
# comma from field two.
/^Customer ID/ { ... } # if the line starts with 'Customer ID'; do
r = $2 OFS $3 # build a record using field two and three separated by 'OFS'
/^Inventory ID/ {...} # if the line starts with 'Inventory ID'; do
print "\"" r, $2, $3 "\"" # print out a double-quote, the record, OFS, $2, OFS,
# $3 and lastly a double quote
sed
explanation:
Disable default printing with the
-n
flag. When a line starts with "Customer ID", copy the line to hold space. When a line that starts with "Inventory ID" is found, append the hold space to the current line. Use some magical regex to re-arrange the different fields and fix the formatting.
Upvotes: 2
Reputation: 58473
This might work for you (GNU sed):
sed -r '/^Customer/{h;d};/^Inventory/!d;G;s/.*=([^,]*).*=([^\n]*).*=([^,]*).*=(.*)/"\3", "\4", "\1", "\2"/' file
Upvotes: 1
Reputation: 195179
another awk one-liner without using FS
awk -vq="\"" '/^(Cus|Inv)/{f=$0~/^Cus/;gsub(/[^,]*=/,q);sub(/,/,q",");c=f?$0q:c;if(!f)print c","$0q}' file
test:
kent$ echo "Customer ID=9000, Customer Name=Acme, Inc
Inventory ID=INV_ID1, Inventory Name=Acme_INV1
Product ID=100, Product Name=Banana
Product ID=200, Product Name=Apple
Inventory ID=INV_ID2, Inventory Name=Acme_INV2
Product ID=100, Product Name=Banana
Product ID=300, Product Name=Kiwi
Customer ID=7500, Customer Name=Anvil, Corp
Inventory ID=INV_ID3, Inventory Name=Anvil_INV1
Product ID=200, Product Name=Apple"|awk -vq="\"" '/^(Cus|Inv)/{f=$0~/^Cus/;gsub(/[^,]*=/,q);sub(/,/,q",");c=f?$0q:c;if(!f)print c","$0q}'
"9000","Acme, Inc","INV_ID1","Acme_INV1"
"9000","Acme, Inc","INV_ID2","Acme_INV2"
"7500","Anvil, Corp","INV_ID3","Anvil_INV1"
Upvotes: 1
Reputation: 241988
Perl solution:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
my ($customer, $name);
while (<>) {
if (/Customer ID=(.*), Customer Name=(.*)/) {
($customer, $name) = ($1, $2);
} elsif (/Inventory ID=(.*), Inventory Name=(.*)/) {
say join ', ' => map qq("$_"), $customer, $name, $1, $2;
}
}
Upvotes: 1