as7951
as7951

Reputation: 187

script to convert Csv to xml

Need to process the file using for loop

I have written below code to convert csv to xml. Here have written separate tag for each column.
In input file have column from 1 to 278. In output file need to have tag from A1 to A278,

Code :

file_in="Prepaid_plan_voucher.csv"
file_out="Prepaid_plan_voucher.xml"
echo '<?xml version="1.0"?>' > $file_out
#echo '<Customers>' >> $file_out
echo '  <TariffRecords>' >> $file_out
echo '  <Tariff>' >> $file_out
while IFS=$',' read -r -a arry
do
#  echo '  <TariffRecords>' >> $file_out
#  echo '  <Tariff>' >> $file_out
  echo '    <A1>'${arry[0]}'</A1>' >> $file_out
  echo '    <A2>'${arry[1]}'</A2>' >> $file_out
  echo '    <A3>'${arry[2]}'</A3>' >> $file_out
#  echo '  </TariffRecords>' >> $file_out
#  echo '  </Tariff>' >> $file_out
done < $file_in
#echo '</Customers>' >> $file_out
echo '  <TariffRecords>' >> $file_out
echo '  <Tariff>' >> $file_out

Sample Input file.(this is a sample record in actual input file will contain 278 columns). If input file has two or three records, same needs to be appended in one XML file.

name,Tariff Summary,Record ID No.,Operator Name,Circle (Service Area),list
Prepaid Plan Voucher,test_All calls 2p/s,TT07PMPV0188,Ta Te,Gu,
Prepaid Plan Voucher,test_All calls 3p/s,TT07PMPV0189,Ta Te,HR,

Sample output file The above two TariffRecords, tariff will be hard coded at the beginning and end of xml file.

<TariffRecords>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 2p/s</A2>
<A3>TT07PMPV0188</A3>
<A4>Ta Te</A4>
<A5>Gu</A5>
<A6></A6>
<Tariff>
<Tariff>
<A1>Prepaid Plan Voucher</A1>
<A2>test_All calls 3p/s</A2>
<A3>TT07PMPV0189</A3>
<A4>Ta Te</A4>
<A5>HR</A5>
<A6></A6>
<Tariff>
<TariffRecords>

Upvotes: 0

Views: 4491

Answers (2)

Daniel Haley
Daniel Haley

Reputation: 52848

Since it was mentioned in the comments, here's an option using XSLT 3.0.

The processor I tested with is Saxon-HE 9.8 and is run with a java command line. It should be easy to incorporate into a shell script to process multiple files.

CSV Input (added an additional row to show handling of another empty entry and a quoted entry that contains commas that aren't separators)

name,Tariff Summary,Record ID No.,Operator Name,Circle (Service Area),list
Prepaid Plan Voucher,test_All calls 2p/s,TT07PMPV0188,Ta Te,Gu,
Prepaid Plan Voucher,test_All calls 3p/s,TT07PMPV0189,Ta Te,HR,
Prepaid Plan Voucher,,TT07PMPV0190,Ta Te,DH,"some,comma,separated,list"

XSLT 3.0

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" expand-text="yes">
  <xsl:output method="xml" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="csv-uri"/>
  <xsl:param name="csv-encoding" select="'UTF-8'"/>

  <xsl:template name="init">
    <TariffRecords>
      <xsl:choose>
        <xsl:when test="unparsed-text-available($csv-uri, $csv-encoding)">
          <xsl:call-template name="csv2xml"/>                               
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="error">
            <xsl:text>Error reading "{$csv-uri}" (encoding "{$csv-encoding}").</xsl:text>
          </xsl:variable>
          <xsl:message><xsl:value-of select="$error"/></xsl:message>
        </xsl:otherwise>
      </xsl:choose>
    </TariffRecords>
  </xsl:template>

  <xsl:template name="csv2xml">
    <xsl:variable name="csv_content" select="unparsed-text($csv-uri, $csv-encoding)"/>
    <xsl:analyze-string select="$csv_content" regex="\r?\n">
      <xsl:non-matching-substring>
        <xsl:if test="position() > 1"><!--ignore header-->
          <Tariff>
            <xsl:analyze-string select="concat(.,',')" regex='"([^"]*)",?|([^,]+),?'>
              <!--group 1 is wrapped in quotes-->
              <!--group 2 is not wrapped quotes-->
              <xsl:matching-substring>
                <xsl:element name="A{position()}">
                  <xsl:value-of select="(regex-group(1),regex-group(2))" separator=""/>
                </xsl:element>
              </xsl:matching-substring>
              <xsl:non-matching-substring>
                <xsl:element name="A{position()}"/>
              </xsl:non-matching-substring>
            </xsl:analyze-string>
          </Tariff>          
        </xsl:if>
      </xsl:non-matching-substring>      
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

Command line (see here for more info on running Saxon from the command line)

java -cp "C:/apps/SaxonHE9-8-0-11J/saxon9he.jar" net.sf.saxon.Transform -it:init -xsl:"csv2xml.xsl" -o:"output.xml" csv-uri="input.csv"

Output

<?xml version="1.0" encoding="UTF-8"?>
<TariffRecords>
   <Tariff>
      <A1>Prepaid Plan Voucher</A1>
      <A2>test_All calls 2p/s</A2>
      <A3>TT07PMPV0188</A3>
      <A4>Ta Te</A4>
      <A5>Gu</A5>
      <A6/>
   </Tariff>
   <Tariff>
      <A1>Prepaid Plan Voucher</A1>
      <A2>test_All calls 3p/s</A2>
      <A3>TT07PMPV0189</A3>
      <A4>Ta Te</A4>
      <A5>HR</A5>
      <A6/>
   </Tariff>
   <Tariff>
      <A1>Prepaid Plan Voucher</A1>
      <A2/>
      <A3>TT07PMPV0190</A3>
      <A4>Ta Te</A4>
      <A5>DH</A5>
      <A6>some,comma,separated,list</A6>
   </Tariff>
</TariffRecords>

Upvotes: 2

hradecek
hradecek

Reputation: 2513

Though, this is not the most elegant solution, but I think you just want to simply do this, if I understand correctly. So doing as many modifications to your code as possible I got:

NUM_OF_COLS=5
echo '<TariffRecords>' >> $file_out
while IFS=$',' read -r -a arry
do
  tariff="  <Tariff>\n"
  for i in $(seq 0 $NUM_OF_COLS); do
    tariff="${tariff}    <A$i>${arry[$i]}</A$i>\n"
  done
  tariff="${tariff}  </Tariff>"
  echo -e ${tariff} >> $file_out
done < <(tail -n +1 $file_in)
echo '</TariffRecords>' >> $file_out

Things to note:

We are skipping CSV header by:

<(tail -n +1 $file_in)

Generate "foeach" cycle in range from 0 to $NUM_OF_COLS, which represents column's indices by:

$(seq 0 $NUM_OF_COLS)

Append string by:

tariff="${tariff}......"

Using

echo -e ...

in order to preserve new lines and nice formatting, but you might use another bash utility like xmllint in order to pretty formatting.

EDIT: For mulitple files

In order to process multiple files, replace hardcoded:

file_in="Prepaid_plan_voucher.csv"
file_out="Prepaid_plan_voucher.xml"

by

file_in="$1" # Take the name as an argument from command line
file_out="${1%.csv}.xml" # Remove csv suffix and append xml

and run the script from command line for every csv file, e.g. like this:

$ for f in $(ls *.csv); do ./ourscript.sh $f; done

Upvotes: 2

Related Questions