codexo
codexo

Reputation: 5

TypeError: while doing web scraping

I was just scraping data and want to make two columns of title and date but TypeError occurs

TypeError: from_dict() got an unexpected keyword argument 'columns'

CODE :

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'

    while True:
        response=requests.get(url)
        soup = BeautifulSoup(response.content,'html.parser')
        content = soup.find_all('div',{'class': 'content'})


    for contents in content:
        title_tag = contents.find('span',{'class':'title'})
        title= title_tag.text[1:-1] if title_tag else 'N/A'
        date_tag = contents.find('span',{'class':'meta'})
        date = date_tag.text if date_tag else 'N/A'

        hiv={title : date}
        print(' title : ', title ,' \n date : ' ,date )



    url_tag = soup.find('div',{'class':'pagination'})
    if url_tag.get('href'):
        url = 'https://timesofindia.indiatimes.com/' + url_tag.get('href')
        print(url)    
    else:
        break
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['title' ,'date'])    

pandas is updated to version 0.23.4,then also error occurs.

Upvotes: 0

Views: 57

Answers (1)

chitown88
chitown88

Reputation: 28630

The first thing I noticed is the construction of the dictionary is off. I'm assuming you want the dictionary of the entire title:date. The way as you have it now, will only keep the last.

Then when you do that, the index of the dataframe with be the key, and the values are the series/column. So technically there's only 1 column. I can create the two columns by resetting the index, then that index is put into a column that I rename 'title'

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'


response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})

hiv = {}
for contents in content:
    title_tag = contents.find('span',{'class':'title'})
    title= title_tag.text[1:-1] if title_tag else 'N/A'
    date_tag = contents.find('span',{'class':'meta'})
    date = date_tag.text if date_tag else 'N/A'

    hiv.update({title : date})
    print(' title : ', title ,' \n date : ' ,date )

hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['date'])  
hiv1 = hiv1.rename_axis('title').reset_index()

Output:

print (hiv1)
                                                title                  date
0   I told my boyfriend I was HIV positive and thi...           01 Dec 2018
1   Pay attention to these 7 very common HIV sympt...           30 Nov 2018
2   Transfusion of HIV blood: Panel seeks time til...  2019-01-06T03:54:33Z
3   No. of pregnant women testing HIV+ dips; still...           01 Dec 2018
4                             Busted:5 HIV AIDS myths           30 Nov 2018
5                    Myths and taboos related to AIDS           01 Dec 2018
6                                                 N/A                   N/A
7   Mumbai: Free HIV tests at six railway stations...           23 Nov 2018
8   HIV blood tranfusion: Tamil Nadu govt assures ...  2019-01-05T09:05:27Z
9     Autopsy performed on HIV+ve donor’s body at GRH  2019-01-03T07:45:03Z
10  Madras HC directs to videograph HIV+ve donor’s...  2019-01-01T01:23:34Z
11  HIV +ve Tamil Nadu teen who attempted suicide ...  2018-12-31T03:37:56Z
12    Another woman claims she got HIV-infected blood  2018-12-31T06:34:32Z
13    Another woman says she got HIV from donor blood           29 Dec 2018
14  HIV case: Five-member panel begins inquiry in ...           29 Dec 2018
15  Pregnant woman turns HIV positive after blood ...           26 Dec 2018
16  Pregnant woman contracts HIV after blood trans...           26 Dec 2018
17  Man attacks niece born with HIV for sleeping i...           16 Dec 2018
18  Health ministry implements HIV AIDS Act 2017: ...           11 Sep 2018
19  When meds don’t heal: HIV+ kids fight daily wa...           03 Sep 2018

I'm not quite sure why you're getting the error though. It doesn't make sense since you are using updated Pandas. Maybe uninstall Pandas and then re pip install it?

Otherwise I guess you could just do it in 2 lines and name the columns after converting to dataframe:

hiv1 = pd.DataFrame.from_dict(hiv, orient = 'index').reset_index()
hiv1.columns = ['title','date']

Upvotes: 1

Related Questions