Extract part of a URL store in list within a dataframe - Python

Question

I am trying to extract the numeric part only, in the example below 25709, and add it a variable, lets call that athleteID, that I can later add to a dynamic URL to iterate through and use to send a search request:

'Zola Budd'

I have a list of these URLs (or part URLs) stored in a list within a dataframe and I have iterated twice over this dataframe using the split('=') function and managed to get it to the point below.

 i=[]
 id_list=[]
 for id in df2['athleteURL']:
     i = id.split('\=')
     id_list.append(i)
 print(id_list)

Which then produces a list, one line as an example below:

 'Zola Budd'

I then did a second iteration using '('"')' and got it to the below:

 id_list2=[]


 for id2 in id_list[2]:
     j = id2.split('\"')
     id_list2.append(j)

 #print(id_list2[2])

 athleteIDnumber = id_list2[2]
 print(athleteIDnumber)

 ['2967288', ' target']

However this is where I am now stuck as it appears to be one element within a list plus I am not sure this is the most efficient way to extract this line as I also struggled with using other regex functions.

Any advice or support would be appreciated. Thanks Chris

Extract part of a URL store in list within a dataframe - Python

Answers (1)

Related Questions