Ninja
Ninja

Reputation: 127

convert scraped list to pandas dataframe using columns and index

process and data scraping of url( within all given links in a loop )looks like :

for url in urls :
    page=requests.get(url)
    #fetch and proceed page here and acquire cars info one per page  
    print(car.name)
    print(car_table)

and the output :

BMW
['color','red','weight','50kg','height','120cm','width','200cm','serial','','owner','']
FORD
['color','','weight','','height','','width','','serial','','owner','']
HONDA
['color','blue','weight','60kg','height','','width','160cm','serial','OMEGA','owner','']

at the end how can i have a dataframe same as below by considering that i dunno number of car fields(columns) and number of cars(index) but defined df with them as columns and index

print(car_df)

-----|color|weight|height|width|serial|owner
BMW  |red   50     120    200 
FORD |
HONDA|blue  60            160   OMEGA  

any help appreciated :)

Upvotes: 0

Views: 69

Answers (1)

mitoRibo
mitoRibo

Reputation: 4548

This approach is to create a list of dicts as we iterate through the urls, and then after the loop we convert this to a dictionary. I'm assuming that the car_table is always the column followed by the value over and over again

import pandas as pd
import numpy as np

#Creating lists from your output instead of requesting from the url since you didn't share that
car_names = ['BMW','FORD','HONDA']
car_tables = [
    ['color','red','weight','50kg','height','120cm','width','200cm','serial','','owner',''],
    ['color','','weight','','height','','width','','serial','','owner',''],
    ['color','blue','weight','60kg','height','','width','160cm','serial','OMEGA','owner',''],
]
urls = range(len(car_names))


all_car_data = []
for url in urls:
    car_name = car_names[url] #using car_name instead of car.name for this example
    car_table = car_tables[url] #again, you get this value some other way
    
    car_data = {'name':car_name}
    
    columns = car_table[::2] #starting from 0, skip every other entry to just get the columns
    values = car_table[1::2] #starting from 1, skip every other entry to just get the values
    
    #Zip the columns together with the values, then iterate and update the dict
    for col,val in zip(columns,values):
        car_data[col] = val
    
    #Add the dict to a list to keep track of all the cars
    all_car_data.append(car_data)
    
#Convert to a dataframe
df = pd.DataFrame(all_car_data)
#df = df.replace({'':np.NaN}) #you can use this if you want to replace the '' with NaNs
df

Output:

name    color   weight  height  width   serial  owner
0   BMW red 50kg    120cm   200cm       
1   FORD                        
2   HONDA   blue    60kg        160cm   OMEGA   

Upvotes: 1

Related Questions