JMnedict.xml file Trying to purge entries by xml tag?

Question

JMnedict is a Japanese Name file that's free online in xml. I have not found another way it was stored. It is free to use as long as credit is given. Found here: https://www.edrdg.org/enamdict/enamdict_doc.html (I did not write the code below, I only copy-pasted it).

I get the principles of xml and why it exists, but nothing beyond that. I am not that savvy to specific programs that handle them besides text editors and maybe OpenOffice.

I downloaded it for the Japanese names. I want to filter the entries only to the fem, masc, given, surname, and unclass

I hope to make a Japanese name generator from the data and this is the best source material for it. (I made a Korean one already).

The thing is that the list of entities is:

character, company name, creature, deity, document, event, female given name or forename, fiction, given name, group, legend, male given name or forename, mythology, object, organization name, other, person, place , product name, religion, service, ship name, railway station, family or surname, unclassified name, work.

And the xml file is 152.3 MB.

I'd like to drop all of the entries that do not fall into the categories I want.

I'm looking for an efficient way to trim the file, so I can upload it to the online database.

Structure of the entries looks like this (Not sure if it helps):


5000000

ゝ泉


ちゅせん


&given;
Chusen

(as an example)

and trim out entries such as this:


5000198

あかり博物館


あかりはくぶつかん


&place;
Akari Museum

So is there an efficient way to purge all of the entries I don't want in BBedit, something similar or a method I'm not familiar with? I need to trim the file so I can upload it to my database (and hopefully it being shorter will help it upload.)

I'm on a Mac if you're suggesting applications.

OR is there a method I've not thought of to purge the entries in a different method?

What I've Tried:

I tried to upload the xml file to my database online directly, but it was too big, even when zipped. (And then I would do MySQL DROPS to remove the offending entries), but file was too big and phpMyAdmin gave up and spat out an error that the file was incomplete.
I did try trimming the file to a few entries for phpMyAdmin, and that uploaded successfully, but uploading it in sections seems inefficient, especially with the file so long and it having entries I don't want anyway.
I tried to convert it to a csv file, but it is too big and the automatic converters online can't handle it. (Converting it to JSON or similar has similar problems)
I tried OpenOffice, but again, it can't handle the file being that large and the application crashed on me. (I thought I could Sort then kill)
I can open it in BBEDIT and manually trim the file, but the file is very long and not organized, so it is super inefficient. I rather be able to sort the file by name_type. And kill the entries that don't belong.

JMnedict.xml file Trying to purge entries by xml tag?

Answers (1)

Related Questions