Reputation: 55
I have to collect data from a web page that will be turned into a .txt file. For manipulation purposes this txt has to be converted into csv using an AWK script.
The txt has the following structure:
GME - Esiti dei mercati - MGP-GAS - asta (AGS)
Tabella esiti - MGP-GAS prezzi e volumi Esiti MGP-GAS ||
|sessione del: 30/03/2020 |
|
|
Prodotti |
Prezzo |
€/MWh |
Volumi totali |
MW |MWh |
Acquisti SRG_TSO |
MWh |
Vendite SRG_TSO |
MWh |
MGP-2020-03-31 |8,625 |
|4.027,000 |96.648,000 |
|- |
|96.648,000 |
|
|
|
|
|
Legenda
||
LEGENDA ||
Prezzo
|Prezzo di remunerazione di cui all'Art. 103 della disciplina del Mercato del Gas naturale.
|
Volumi (MW, MWh)
|Volumi accettati di cui all'Art. 103 della disciplina del Mercato del Gas naturale.
|
Acquisti SRG_TSO
|Quantità accettate in acquisto da Snam Rete Gas.
|
Vendite SRG_TSO
|Quantità accettate in vendita da Snam Rete Gas.
|
|
The values I need to fetch and import into a csv are the ones after MGP-2020-03-31 using pipes "|" as separators. Or better: EDIT:
MGP-2020-03-31 |8,625 |
|4.027,000 |96.648,000 |
|- |
|96.648,000 |
|
In this format: 8,625|4.027,000|96.648,000|- |96.648,000
I have no experience with AWK, so far I've managed to write this:
/Non ci sono dati/{
exit
}
/sessione del/{
data = $3
}
/MGP/{
data = data $0
print data
}
/Non ci sono dati/{
print $0
}
Trying to catch the "no data" case whenever the page shows "Non ci sono dati". How can I get the values beneath the first line (the one with the 8,625 value?). Can you please help? Thank you
Upvotes: 1
Views: 158
Reputation: 203493
Here's how to approach your problem (and assuming a blank line or line with just |
in the input indicates the end of the MGP section):
$ cat tst.awk
sub(/^[[:space:]]*MGP[^|]+[|][[:space:]]*/,"") { inMgp=1 }
inMgp {
sub(/[[:space:]]*[|][[:space:]]*$/,"")
if ( NF ) {
data = data $0
}
else {
gsub(/[[:space:]]*[|][[:space:]]*/,"|",data)
print data
inMgp = 0
}
}
$ awk -f tst.awk file
8,625|4.027,000|96.648,000|-|96.648,000
Upvotes: 1