rdv
rdv

Reputation: 71

How to search using NCBI E-utilities API if your queries contain underscores

I want to search for academic articles using a list of gene IDs. The IDs contain underscores that separate the chromosome number from the unique gene number (e.g. XXX1_1230).

I tried using NCBIs e-utilities using python:

import requests
import xml.etree.ElementTree as ET

# Read search terms from the file
file_path = 'genes.txt'
with open(file_path, 'r') as file:
    search_terms = [line.strip() for line in file.readlines()]

# Read the gene IDs from the file
with open(file_path, 'r') as file:
    gene_ids = [line.strip() for line in file.readlines()]

# Define your API key
api_key = 'my_API_key'

# Base URL for NCBI E-utilities
base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'

# Prepare the search query
params = {
    'db': 'pubmed',
    'term': ' OR '.join(f'"{gene_id}"' for gene_id in gene_ids),  # Enclose each gene ID in quotes
    'api_key': api_key,
    'retmax': 100
}

# Send the request
response = requests.get(base_url, params=params)

# Check the response status code
if response.status_code != 200:
    print(f"Error: Received status code {response.status_code}")
else:
    try:
        # Parse the XML response
        root = ET.fromstring(response.text)
        count = root.find('Count').text
        id_list = [id_elem.text for id_elem in root.findall('.//Id')]
        
        print(f'Total results: {count}')
        print(f'IDs: {id_list}')
    except ET.ParseError as e:
        print("Error: Unable to parse XML response")
        print("Response text:", response.text)

However, from the results I can see that my terms get divided into two strings where the underscore separates them.

Upvotes: 0

Views: 14

Answers (0)

Related Questions