Karl
Karl

Reputation: 329

how to grep and replace it multiple files and multiples elements on OS X

I use this grep command line on OS X.

grep -E 'Title|Amount|AwardID|FirstName|LastName| *.xml and the result is here:

<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>

and now, I tried to use sed to replace strings and get things done. But it does not get things done.

What options should I use to get it.

sed -i "" 's/Title//g'

Results as a txt file:

ABC System, 50000, 100, Name, Thanks

Update

I can do it separately.

$ grep -E 'AwardID|AwardAmount|FirstName|LastName' 1433501.xml > test
$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test

43856 1433501 Faisal Hossain

$ sed -E '/AwardID|AwardAmount|FirstName|LastName/s/.*>([^<]+)<.*/\1/' test | paste -sd',' -

43856,1433501,Faisal,Hossain

but when I put xxx.xml -> *.xml, I need to put new line. What should I put?

Update

AwardTable

xml sel -t -v //AwardID -o , -v //AwardAmount -nl *.xml > AwardTable.csv

InvestigatorTable

xml sel -t     -v //AwardID  -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b -o [PI]    -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b  -o [CoPI]   -nl *.xml

How should I get data for InvestigatorTable? How can I have following formats?

ID, Firstname, Lastname, Role
12345, FirstName, LastName, PI
12345, FirstName, LastName, Co-PI
12345, FirstName, LastName, Former-PI


xml sel -t     -v //AwardID -o , -v //AwardAmount     -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [PI] -b     -m '//Investigator[RoleCode = "Former Principal Investigator"]' -o , -v FirstName -o , -v  LastName -o [FoPI]  -b     -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName -o [CoPI] -b     -nl *.xml

I can get like this

1417948,93147,M. Lee,Allison[PI],Jennifer,Arrigo[CoPI],Cynthia,Chandler[CoPI],Kerstin,Lehnert[CoPI]
1417966,574209,Robb,Lindgren[PI]
1418062,253000,Julia,Coonrod[PI],Gary,Harrison[FoPI]

I can do it manually now but please help it for me.

Update

Please help me to get the results with structures

AwardID, FirstName, LastName, Role

Upvotes: 0

Views: 255

Answers (2)

glenn jackman
glenn jackman

Reputation: 246807

awk would do it:

awk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    END {printf "\b\b \n"}
' << EOF
<Title>ABC System</Title>
<Amount>50000</Amount>
<AwardID>1000</AwardID>
<FirstName>Name</FirstName>
<LastName>Thanks</LastName>
EOF
ABC System, 50000, 1000, Name, Thanks  

With multiple files, I assume you want a newline for each file. GNU awk v4 has an extension: ENDFILE

gawk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    ENDFILE {printf "\b\b \n"}
' *.xml

otherwise it's a bit more work:

awk -v ORS=", " -F '[<>]' '
    /Title|Amount|AwardID|FirstName|LastName/ {print $3} 
    FNR == 1 && FILENAME != ARGV[1] {printf "\b\b \n"}
    END {printf "\b\b \n"}
' *.xml

For robustness, you should be using an XML parser or XSLT transformation.


Given your sample xml files, here's a solution using xmlstarlet, an xml processing tool I like:

xmlstarlet sel -t -v //AwardTitle -o , -v //AwardAmount -o , -v //AwardID -m //Investigator -o , -v FirstName -o , -v LastName -b -nl 1419538.xml 1424234.xml 
IBDR: Workshop on Successful Approaches for Development and Dissemination of Instrumentation for Biological Research - May 1-2, 2014; Rosslyn, VA,49990,1419538,Sameer,Sonkusale,Valencia,Koomson,Eduardo,Rosa-Molinar
RAPID: Role of Physical, Chemical and Diffusion Properties of 4-Methyl-cyclohexane methanol in Remediating Contaminated Water and Water Pipes,49999,1424234,Daniel,Gallagher,Andrea,Dietrich,Paolo,Scardina

If you want to use another XSLT tool, here's the generated stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardTitle"/>
    </xsl:call-template>
    <xsl:text>,</xsl:text>
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardAmount"/>
    </xsl:call-template>
    <xsl:text>,</xsl:text>
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="//AwardID"/>
    </xsl:call-template>
    <xsl:for-each select="//Investigator">
      <xsl:text>,</xsl:text>
      <xsl:call-template name="value-of-template">
        <xsl:with-param name="select" select="FirstName"/>
      </xsl:call-template>
      <xsl:text>,</xsl:text>
      <xsl:call-template name="value-of-template">
        <xsl:with-param name="select" select="LastName"/>
      </xsl:call-template>
    </xsl:for-each>
    <xsl:value-of select="'&#10;'"/>
  </xsl:template>
  <xsl:template name="value-of-template">
    <xsl:param name="select"/>
    <xsl:value-of select="$select"/>
    <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
      <xsl:value-of select="'&#10;'"/>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

The schema is not great. Specifically, it's not flexible: what if there are more than 5 investigators? You need something like this:

Perhaps more simple:

Award table: id, title, amount
AwardInvestigators table: award_id, firstname, lastname, role


BTW, I read the question more carefully. I've amended by xmlstarlet command a bit to ensure the Principal Investigator's name is first:

xmlstarlet sel -t \
    -v //AwardID -o , -v //AwardAmount \
    -m '//Investigator[RoleCode = "Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b \
    -m '//Investigator[RoleCode = "Co-Principal Investigator"]' -o , -v FirstName -o , -v LastName  -b \
    -nl \
*.xml

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77095

Here is another way to do it:

sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' *.xml | paste -sd',' -

With your sample data, it gave the following output:

$ sed -nE '/Title|Amount|AwardID|FirstName|LastName/s/.*>([^<]+)<.*/\1/p' xmlfile | paste -sd',' -
Collaborative Research: Using the Rurutu hotspot to evaluate mantle motion and absolute plate motion models,137715,1433097,Jasper,Konter

Upvotes: 2

Related Questions