Reputation: 127
process and data scraping of url
( within all given links in a loop )looks like :
for url in urls :
page=requests.get(url)
#fetch and proceed page here and acquire cars info one per page
print(car.name)
print(car_table)
and the output :
BMW
['color','red','weight','50kg','height','120cm','width','200cm','serial','','owner','']
FORD
['color','','weight','','height','','width','','serial','','owner','']
HONDA
['color','blue','weight','60kg','height','','width','160cm','serial','OMEGA','owner','']
at the end how can i have a dataframe same as below by considering that i dunno number of car fields(columns) and number of cars(index) but defined df with them as columns and index
print(car_df)
-----|color|weight|height|width|serial|owner
BMW |red 50 120 200
FORD |
HONDA|blue 60 160 OMEGA
any help appreciated :)
Upvotes: 0
Views: 69
Reputation: 4548
This approach is to create a list of dicts as we iterate through the urls, and then after the loop we convert this to a dictionary. I'm assuming that the car_table is always the column followed by the value over and over again
import pandas as pd
import numpy as np
#Creating lists from your output instead of requesting from the url since you didn't share that
car_names = ['BMW','FORD','HONDA']
car_tables = [
['color','red','weight','50kg','height','120cm','width','200cm','serial','','owner',''],
['color','','weight','','height','','width','','serial','','owner',''],
['color','blue','weight','60kg','height','','width','160cm','serial','OMEGA','owner',''],
]
urls = range(len(car_names))
all_car_data = []
for url in urls:
car_name = car_names[url] #using car_name instead of car.name for this example
car_table = car_tables[url] #again, you get this value some other way
car_data = {'name':car_name}
columns = car_table[::2] #starting from 0, skip every other entry to just get the columns
values = car_table[1::2] #starting from 1, skip every other entry to just get the values
#Zip the columns together with the values, then iterate and update the dict
for col,val in zip(columns,values):
car_data[col] = val
#Add the dict to a list to keep track of all the cars
all_car_data.append(car_data)
#Convert to a dataframe
df = pd.DataFrame(all_car_data)
#df = df.replace({'':np.NaN}) #you can use this if you want to replace the '' with NaNs
df
Output:
name color weight height width serial owner
0 BMW red 50kg 120cm 200cm
1 FORD
2 HONDA blue 60kg 160cm OMEGA
Upvotes: 1