Reputation: 41
I have a csv list of 335 gene access numbers, and I want to put all of them into a certain URL:
https://www.ncbi.nlm.nih.gov/nuccore/DQ147858.1?report=fasta
Where the 8-letter gene access numbers (DQ147858 above) is different in each URL and from the corresponding csv list.
And then I need to also know how to access all the generated URLs with Requests.
Any help is very much appreciated.
Upvotes: 1
Views: 1749
Reputation: 1866
You can generalize the url creation with a method:
def build_url(gene):
return 'https://www.ncbi.nlm.nih.gov/nuccore/' + gene + '.1?report=fasta'
Then, to build for every gene
you can iterate over the initial list and apply the function build_url
for every gene.
# Generic extraction of list genes from csv
genes = extract_list(csv)
# Using list comprehension
genes_urls = [build_url(gene) for gene in genes]
# Using regular for
genes_urls = []
for gene in genes:
genes_urls.append(build_url(gene))
Following this answer, to make a request, you would simply do:
import requests
# Using list comprehension
res = [requests.get(url) for url in genes_urls]
# Using regular for
res = []
for url in genes_urls:
res.append(requests.get(url))
Additionally, you can use multithreading to speed up the requests.
Upvotes: 1
Reputation: 390
To read a .csv, I use this:
result = []
for line in open("file.csv"):
result.append(line.split(','))
This will give you a list of each element between the commas. I don't know which of the se elements you need, but take a look at result[0]
to see which index you need.
With the index you need,
fmtstr = "https://www.ncbi.nlm.nih.gov/nuccore/{}?report=fasta"
urls = []
for lst in result:
urls.append(fmtstr.format( lst[desired_index] ))
Then, you can iterate through the list of urls and use the requests library as you desire.
This isn't the most compact way of doing things, but it's functional and separates steps for simpler viewing.
Upvotes: 1
Reputation: 146
csv = open('PATH_TO_CSV', 'r')
for gene_number in csv.readlines().split(','):
URL = 'https://www.ncbi.nlm.nih.gov/nuccore/' + gene_number + '.1?report=fasta'
// request parsing here
Upvotes: 0