Divya Jose
Divya Jose

Reputation: 389

Pandas Dataframe

I want to represent data using pandas dataframe , the column name - Product Title and populate t .

For eg :

Product Title

Marvel : Movies Collection

Marvel

Diney Movie and so on..


import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd

r= requests.get("http://www.walmart.com/search/?query=marvel&cat_id=4096_530598")
r.content
soup = BeautifulSoup(r.content)

g_data = soup.find_all("div", {"class" : "tile-conent"})
g_price = soup.find_all("div",{"class" : "item-price-container"})
g_star = soup.find_all("div",{"class" : "stars stars-small tile-row"})

for product_title in g_data:
   a_product_title = product_title.find_all("a","js-product-title")
   for text_product_title in a_product_title : 
      t = text_product_title.text  
      print t 

Desired Output-

Product Title : 


Marvel Heroes: Collection      
Marvel: Guardians Of The Galaxy (Widescreen)    
Marvel Complete Giftset (Widescreen)    
Marvel's The Avengers (Widescreen)    
Marvel Knights: Wolverine Versus Sabretooth - Reborn (Widescreen)    
Superheroes Collection: The Incredible Hulk Returns / The Trial Of The Incredible Hulk / How To Draw Comics     The Marvel Way (Widescreen)
Marvel: Iron Man & Hulk - Heroes United (Widescreen)    
Marvel's The Avengers (DVD + Blu-ray) (Widescreen)     
Captain America: The Winter Soldier (Widescreen)    
Iron Man 3 (DVD + Digital Copy) (Widescreen)    
Thor: The Dark World (Widescreen)    
Spider-Man (2-Disc) (Special Edition) (Widescreen)    
Elektra / Fantastic Four / Daredevil (Director's Cut) / Fantastic Four 2: Rise Of The Silver Surfer
Spider-Man / Spider-Man 2 / Spider-Man 3 (Widescreen)    
Spider-Man 2 (Widescreen)    
The Punisher (Extended Cut) (Widescreen)    
DC Showcase: Superman / Shazam!: The Return Of The Black Adam
Ultimate Avengers: The Movie (Widescreen)    
The Next Avengers: Heroes Of Tomorrow (Widescreen)    
Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen) 

I tired append function and join but it dint work.. Do we have any specific function this in pandas dataframe?

The desired output should be outcome of using Pandas dataframe.

Upvotes: 4

Views: 795

Answers (1)

EdChum
EdChum

Reputation: 394031

Well this will get you started, this extracts all the titles into a dict (I use a defaultdict for convenience):

In [163]:

from collections import defaultdict
data=defaultdict(list)
for product_title in g_data:
    a_product_title = product_title.find_all("a","js-product-title")
    for text_title in a_product_title:
        data['Product title'].append(text_title.text)


df = pd.DataFrame(data)
df
Out[163]:
                                        Product title
0                           Marvel Heroes: Collection
1        Marvel: Guardians Of The Galaxy (Widescreen)
2                Marvel Complete Giftset (Widescreen)
3                  Marvel's The Avengers (Widescreen)
4   Marvel Knights: Wolverine Versus Sabretooth - ...
5   Superheroes Collection: The Incredible Hulk Re...
6   Marvel: Iron Man & Hulk - Heroes United (Wides...
7   Marvel's The Avengers (DVD + Blu-ray) (Widescr...
8    Captain America: The Winter Soldier (Widescreen)
9        Iron Man 3 (DVD + Digital Copy) (Widescreen)
10                  Thor: The Dark World (Widescreen)
11  Spider-Man (2-Disc) (Special Edition) (Widescr...
12  Elektra / Fantastic Four / Daredevil (Director...
13  Spider-Man / Spider-Man 2 / Spider-Man 3 (Wide...
14                          Spider-Man 2 (Widescreen)
15           The Punisher (Extended Cut) (Widescreen)
16  DC Showcase: Superman / Shazam!: The Return Of...
17          Ultimate Avengers: The Movie (Widescreen)
18  The Next Avengers: Heroes Of Tomorrow (Widescr...
19     Ultimate Avengers 1 & 2 (Blu-ray) (Widescreen)

So you can modify this script to add the price and actors as keys to the data dict and then construct the df from the resultant dict, this will be better than appending a row at a time

Upvotes: 4

Related Questions