Stian Håklev
Stian Håklev

Reputation: 1290

How to automatically turn BibTex citation into something parseable by Zotero?

I have a citation system which publishes users notes to a wiki (Researchr). Programmatically, I have access to the full BibTeX record of each entry, and I also display this on the individual pages (for example - click on BibTeX). This is in the interest of making it easy for users of other citation manager to automatically import the citation of a paper that interests them. I would also like other citation managers, especially Zotero, to be able to automatically detect and import a citation.

Zotero lists a number of ways of exposing metadata that it will understand, including meta tags with RDF, COiNS, Dublin Core and unAPI. Is there a Ruby library for converting BibTeX to any of these standards automatically - or a Javascript library? I could probably create something, but if something existed, it would be far more robust (BibTeX has so many publication types and fields etc).

Upvotes: 6

Views: 926

Answers (3)

a.t.
a.t.

Reputation: 2779

By now one can simply import bibtex files of type .bib directly in Zotero. However, I noticed my bibtex files were often less complete than Zotero (in particular they often missed a DOI), and I did not find an "auto-complete" function (based on the data in the bibtex entries) in Zotero.

So I import the .bib file with Zotero, to ensure they are all in there. Then I run a python script that gets all the missing DOI's it can find for the entries in that .bib file, and exports them to a space separated .txt file.:

# pip install habanero
from habanero import Crossref
import re


def titletodoi(keyword):
    cr = Crossref()
    result = cr.works(query=keyword)
    items = result["message"]["items"]
    item_title = items[0]["title"]
    tmp = ""
    for it in item_title:
        tmp += it
    title = keyword.replace(" ", "").lower()
    title = re.sub(r"\W", "", title)
    # print('title: ' + title)
    tmp = tmp.replace(" ", "").lower()
    tmp = re.sub(r"\W", "", tmp)
    # print('tmp: ' + tmp)
    if title == tmp:
        doi = items[0]["DOI"]
        return doi
    else:
        return None


def get_dois(titles):
    dois = []
    for title in titles:
        try:
            doi = titletodoi(title)
            print(f"doi={doi}, title={title}")
            if not doi is None:
                dois.append(doi)
        except:
            pass
            # print("An exception occurred")
    print(f"dois={dois}")
    return dois


def read_titles_from_file(filepath):
    with open(filepath) as f:
        lines = f.read().splitlines()
    split_lines = splits_lines(lines)
    return split_lines


def splits_lines(lines):
    split_lines = []
    for line in lines:
        new_lines = line.split(";")
        for new_line in new_lines:
            split_lines.append(new_line)
    return split_lines


def write_dois_to_file(dois, filename, separation_char):
    textfile = open(filename, "w")
    for doi in dois:
        textfile.write(doi + separation_char)
    textfile.close()


filepath = "list_of_titles.txt"
titles = read_titles_from_file(filepath)
dois = get_dois(titles)
write_dois_to_file(dois, "dois_space.txt", " ")
write_dois_to_file(dois, "dois_per_line.txt", "\n")

The DOIs of the .txt are fed into magic wand of Zotero. Next, I (manually) remove the duplicates by choosing the latest added entry (because that comes from the magic wand with the most data).

After that, I run another script to update all the reference id's in my .tex and .bib files to those generated by Zotero:

# Importing library
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import *
import os, fnmatch

import Levenshtein as lev


# Let's define a function to customize our entries.
# It takes a record and return this record.
def customizations(record):
    """Use some functions delivered by the library

    :param record: a record
    :returns: -- customized record
    """
    record = type(record)
    record = author(record)
    record = editor(record)
    record = journal(record)
    record = keyword(record)
    record = link(record)
    record = page_double_hyphen(record)
    record = doi(record)
    return record


def get_references(filepath):
    with open(filepath) as bibtex_file:
        parser = BibTexParser()
        parser.customization = customizations
        bib_database = bibtexparser.load(bibtex_file, parser=parser)
        # print(bib_database.entries)
    return bib_database


def get_reference_mapping(main_filepath, sub_filepath):
    found_sub = []
    found_main = []
    main_into_sub = []

    main_references = get_references(main_filepath)
    sub_references = get_references(sub_filepath)

    for main_entry in main_references.entries:
        for sub_entry in sub_references.entries:

            # Match the reference ID if 85% similair titles are detected
            lev_ratio = lev.ratio(
                remove_curly_braces(main_entry["title"]).lower(),
                remove_curly_braces(sub_entry["title"]).lower(),
            )
            if lev_ratio > 0.85:
                print(f"lev_ratio={lev_ratio}")

                if main_entry["ID"] != sub_entry["ID"]:
                    print(f'replace: {sub_entry["ID"]} with: {main_entry["ID"]}')
                    main_into_sub.append([main_entry, sub_entry])

                    # Keep track of which entries have been found
                    found_sub.append(sub_entry)
                    found_main.append(main_entry)
    return (
        main_into_sub,
        found_main,
        found_sub,
        main_references.entries,
        sub_references.entries,
    )


def remove_curly_braces(string):
    left = string.replace("{", "")
    right = left.replace("{", "")
    return right


def replace_references(main_into_sub, directory):
    for pair in main_into_sub:
        main = pair[0]["ID"]
        sub = pair[1]["ID"]
        print(f"replace: {sub} with: {main}")

        # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
        # findReplace(latex_root_dir, sub, main, "*.tex")
        # findReplace(latex_root_dir, sub, main, "*.bib")


def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)


def list_missing(main_references, sub_references):
    for sub in sub_references:
        if not sub["ID"] in list(map(lambda x: x["ID"], main_references)):
            print(f'the following reference has a changed title:{sub["ID"]}')


latex_root_dir = "some_path/"
main_filepath = f"{latex_root_dir}latex/Literature_study/zotero.bib"
sub_filepath = f"{latex_root_dir}latex/Literature_study/references.bib"
(
    main_into_sub,
    found_main,
    found_sub,
    main_references,
    sub_references,
) = get_reference_mapping(main_filepath, sub_filepath)
replace_references(main_into_sub, latex_root_dir)
list_missing(main_references, sub_references)


# For those references which have levenshtein ratio below 85 you can specify a manual swap:
manual_swap = []  # main into sub
# manual_swap.append(["cantley_impact_2021","cantley2021impact"])
# manual_swap.append(["widemann_envision_2021","widemann2020envision"])
for pair in manual_swap:
    main = pair[0]
    sub = pair[1]
    print(f"replace: {sub} with: {main}")

    # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
    # findReplace(latex_root_dir, sub, main, "*.tex")
    # findReplace(latex_root_dir, sub, main, "*.bib")

Upvotes: 0

Jeen Broekstra
Jeen Broekstra

Reputation: 22053

There's a BibTeX2RDF convertor available here, might be what you're after.

Upvotes: 2

adam.smith
adam.smith

Reputation: 31

unAPI is not a data standard - it's a way to serve data (to Zotero and other programs). Zotero imports Bibtex, so serving Bibtex via unAPI works just fine. Inspire is an example of a site that does that: http://inspirehep.net/

Upvotes: 1

Related Questions