Reputation: 83
I have one xml file like this:
<dbReference type="PM" id="17224074"/>
<dbReference type="DOI" id="10.1186/bcr1637"/>
</citation>
<scope>VARIANTS ILE-282 AND ASN-777</scope>
</reference>
<comment type="function">
<text evidence="24">Calcium.</text>
</comment>
<comment type="function">
<text evidence="24">Has a strong inhibitory effect on APP C99 and C83 production.</text>
</comment>
<comment type="subunit">
<text evidence="5 13">Homodimer; disulfide-linked.</text>
</comment>
<comment type="interaction">
<interactant intactId="EBI-727477"/>
<interactant intactId="EBI-7644904">
<id>Q9JIY2</id>
<label>Cbll1</label>
</interactant>
<organismsDiffer>true</organismsDiffer>
<experiments>21</experiments>
</comment>
I want to extract only the information in
<comment type="function">...</comment>
In this example being: 'Calcium.' AND 'as a strong inhibitory effect on APP C99 and C83 production.'
I have this table where I want to save the data:
CREATE TABLE IF NOT EXISTS INFORMATION (id varchar(255) NOT NULL, name varchar(255), entry varchar(255), comment longtext, PRIMARY KEY (id));
where I would save 'Calcium.' AND 'as a strong inhibitory effect on APP C99 and C83 production.' in column named 'comment'. I thought that I could insert this info directly from the xml to the table with LOAD XML but my xml file has too many different fields. How do I do this? Do I have to first extract the data from the xml and then insert into the table?
Upvotes: 0
Views: 87
Reputation: 16551
One option that can be useful:
File: '/path/to/xml/file/xmlfile.xml'
<dbReference type="PM" id="17224074"/>
<dbReference type="DOI" id="10.1186/bcr1637"/>
</citation>
<scope>VARIANTS ILE-282 AND ASN-777</scope>
</reference>
<comment type="function">
<text evidence="24">Calcium.</text>
</comment>
<comment type="function">
<text evidence="24">Has a strong inhibitory effect on APP C99 and C83 production.</text>
</comment>
<comment type="subunit">
<text evidence="5 13">Homodimer; disulfide-linked.</text>
</comment>
<comment type="interaction">
<interactant intactId="EBI-727477"/>
<interactant intactId="EBI-7644904">
<id>Q9JIY2</id>
<label>Cbll1</label>
</interactant>
<organismsDiffer>true</organismsDiffer>
<experiments>21</experiments>
</comment>
MySQL Command-line:
mysql> DROP TEMPORARY TABLE IF EXISTS `temp_information`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> DROP TABLE IF EXISTS `information`;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> CREATE TABLE IF NOT EXISTS `information` (
-> `id` VARCHAR(255) NOT NULL,
-> `name` VARCHAR(255),
-> `entry` VARCHAR(255),
-> `comment` LONGTEXT,
-> PRIMARY KEY (`id`)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TEMPORARY TABLE IF NOT EXISTS `temp_information` (
-> `id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
-> `type` VARCHAR(20),
-> `text` TEXT,
-> `evidence` VARCHAR(20)
-> );
Query OK, 0 rows affected (0.00 sec)
mysql> LOAD XML INFILE '/path/to/xml/file/xmlfile.xml'
-> INTO TABLE `temp_information`
-> ROWS IDENTIFIED BY '<comment>';
Query OK, 4 rows affected (0.00 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
mysql> INSERT INTO `information` (`id`, `comment`)
-> SELECT UUID(), GROUP_CONCAT(`text` ORDER BY `id` SEPARATOR ' ')
-> FROM
-> `temp_information`
-> WHERE
-> `type` = 'function'
-> GROUP BY `evidence`;
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> SELECT
-> `id`,
-> `name`,
-> `entry`,
-> `comment`
-> FROM
-> `information`;
+--------------------------------------+------+-------+------------------------------------------------------------------------+
| id | name | entry | comment |
+--------------------------------------+------+-------+------------------------------------------------------------------------+
| e720d259-fcde-11e5-be3f-a4badbf9ce21 | NULL | NULL | Calcium. Has a strong inhibitory effect on APP C99 and C83 production. |
+--------------------------------------+------+-------+------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> DROP TEMPORARY TABLE IF EXISTS `temp_information`;
Query OK, 0 rows affected (0.00 sec)
Check:
Upvotes: 2
Reputation: 1966
use an XML parser (such as SAXParser) to parse the file first, then iterate over nodes, looking for comment nodes. For each one that has a type of "function", put the node text into the DB.
Upvotes: 1