Anacarnil
Anacarnil

Reputation: 55

How to convert a .txt file into .csv using AWK

I have to collect data from a web page that will be turned into a .txt file. For manipulation purposes this txt has to be converted into csv using an AWK script.

The txt has the following structure:

    GME - Esiti dei mercati - MGP-GAS - asta (AGS) 
    Tabella esiti - MGP-GAS prezzi e volumi Esiti MGP-GAS ||

       |sessione del: 30/03/2020    |
    |
    |
    Prodotti |
    Prezzo |
    €/MWh |
    Volumi totali |
    MW |MWh |
    Acquisti SRG_TSO |
    MWh |
    Vendite SRG_TSO |
    MWh |

    MGP-2020-03-31 |8,625 |
    |4.027,000 |96.648,000 |
    |- |
    |96.648,000 |
    |

    |
    |
    |
    |
    Legenda 
    ||
    LEGENDA ||
    Prezzo  
    |Prezzo di remunerazione di cui all'Art. 103 della disciplina del Mercato del Gas naturale. 
    |
    Volumi (MW, MWh)  
    |Volumi accettati di cui all'Art. 103 della disciplina del Mercato del Gas naturale. 
    |
    Acquisti SRG_TSO  
    |Quantità accettate in acquisto da Snam Rete Gas. 
    |
    Vendite SRG_TSO  
    |Quantità accettate in vendita da Snam Rete Gas. 
    |
    |

The values I need to fetch and import into a csv are the ones after MGP-2020-03-31 using pipes "|" as separators. Or better: EDIT:

    MGP-2020-03-31 |8,625 |
    |4.027,000 |96.648,000 |
    |- |
    |96.648,000 |
    |

In this format: 8,625|4.027,000|96.648,000|- |96.648,000

I have no experience with AWK, so far I've managed to write this:

/Non ci sono dati/{
      exit
    }

    /sessione del/{
         data =  $3
    }

    /MGP/{
        data = data $0 

        print data
    }

    /Non ci sono dati/{
        print $0
    }

Trying to catch the "no data" case whenever the page shows "Non ci sono dati". How can I get the values beneath the first line (the one with the 8,625 value?). Can you please help? Thank you

Upvotes: 1

Views: 158

Answers (1)

Ed Morton
Ed Morton

Reputation: 203493

Here's how to approach your problem (and assuming a blank line or line with just | in the input indicates the end of the MGP section):

$ cat tst.awk
sub(/^[[:space:]]*MGP[^|]+[|][[:space:]]*/,"") { inMgp=1 }
inMgp {
    sub(/[[:space:]]*[|][[:space:]]*$/,"")
    if ( NF ) {
        data = data $0
    }
    else {
        gsub(/[[:space:]]*[|][[:space:]]*/,"|",data)
        print data
        inMgp = 0
    }
}

$ awk -f tst.awk file
8,625|4.027,000|96.648,000|-|96.648,000

Upvotes: 1

Related Questions