Reputation: 33
Im trying to download all the list of songs from http://los40.com.ar/lista40/, i can download manually but i would like to automate the process. First i have extract the ulrs whith beatiful soup, but i can't navigate the result of that
For example, this is the first song in the list:
var datos_cancion_1 = Array();
datos_cancion_1['url_audioenci'] = 'https://recursosweb.prisaradio.com/audios/dest/010002713547.mp4';
datos_cancion_1['url_muzu'] = '';
datos_cancion_1['url_youtube'] = 'https://www.youtube.com/watch?v=0S3enulCT8E';
datos_cancion_1['url_itunes'] = '';
datos_cancion_1['posicion'] = '1';
datos_cancion_1['url_caratula'] = 'https://recursosweb.prisaradio.com/fotos/dest/010002713548.jpg';
datos_cancion_1['titulo_cancion'] = '22';
datos_cancion_1['nombre_artista'] = 'Greeicy;Tini';
datos_cancion_1['idYes'] = 'Tini';
datos_cancion_1['VidAu'] = 0;
And i would like to obtain an array or json like:['https://recursosweb.prisaradio.com/audios/dest/010002713547.mp4','https://recursosweb.prisaradio.com/fotos/dest/010002713548.jpg,Greeicy;Tini] [datos_cancion_1['url_audioenci'],datos_cancion_1['url_caratula'],]
This is my code, i hope you can help me:
from bs4 import BeautifulSoup
import requests
import json
import re
import urllib
url = 'http://los40.com.ar/m/lista40/'
videos = []
response = requests.get(url)
bs = BeautifulSoup(response.text,"html.parser")
all_script=bs.find_all('script', language='javascript', type='text/javascript')
data=all_script[8:]
a= data[0].string
b=['https://recursosweb.prisaradio.com/audios/dest/010002713547.mp4','https://recursosweb.prisaradio.com/fotos/dest/010002713548.jpg','Greeicy;Tini', datos_cancion_1['nombre_artista'] ]
print(a)
urllib.request.urlretrieve(b[0],b[2] +'.mp3')
Upvotes: 1
Views: 948
Reputation: 11
You can try this:
song_list = [''.join(' '.join(i.text.split('\n')).split('=')).split(';') for i in data]
That will give you an array of arrays each one like this:
["'https://recursosweb.prisaradio.com/audios/dest/010002696230.mp4'",
" datos_cancion_2['url_muzu'] ''",
" datos_cancion_2['url_youtube'] "
"'https://www.youtube.com/watch?v1Jw_mhoCiFY'",
" datos_cancion_2['url_itunes'] ''",
" datos_cancion_2['posicion'] '2'",
" datos_cancion_2['url_caratula'] "
"'https://recursosweb.prisaradio.com/fotos/dest/010002696233.jpg'",
" datos_cancion_2['titulo_cancion'] 'Cristina'",
" datos_cancion_2['nombre_artista'] 'Sebastián Yatra'",
" datos_cancion_2['idYes'] 'Sebastian-Yatra'",
" datos_cancion_2['VidAu'] 0",
' ']
From here I think you are going to be able to order the final array as you wish.
Hope this helps you.
Upvotes: 1