Reputation: 71
I want to search for academic articles using a list of gene IDs. The IDs contain underscores that separate the chromosome number from the unique gene number (e.g. XXX1_1230).
I tried using NCBIs e-utilities using python:
import requests
import xml.etree.ElementTree as ET
# Read search terms from the file
file_path = 'genes.txt'
with open(file_path, 'r') as file:
search_terms = [line.strip() for line in file.readlines()]
# Read the gene IDs from the file
with open(file_path, 'r') as file:
gene_ids = [line.strip() for line in file.readlines()]
# Define your API key
api_key = 'my_API_key'
# Base URL for NCBI E-utilities
base_url = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'
# Prepare the search query
params = {
'db': 'pubmed',
'term': ' OR '.join(f'"{gene_id}"' for gene_id in gene_ids), # Enclose each gene ID in quotes
'api_key': api_key,
'retmax': 100
}
# Send the request
response = requests.get(base_url, params=params)
# Check the response status code
if response.status_code != 200:
print(f"Error: Received status code {response.status_code}")
else:
try:
# Parse the XML response
root = ET.fromstring(response.text)
count = root.find('Count').text
id_list = [id_elem.text for id_elem in root.findall('.//Id')]
print(f'Total results: {count}')
print(f'IDs: {id_list}')
except ET.ParseError as e:
print("Error: Unable to parse XML response")
print("Response text:", response.text)
However, from the results I can see that my terms get divided into two strings where the underscore separates them.
Upvotes: 0
Views: 14