Reputation: 5
I was just scraping data and want to make two columns of title and date but TypeError occurs
TypeError: from_dict() got an unexpected keyword argument 'columns'
CODE :
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'
while True:
response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})
for contents in content:
title_tag = contents.find('span',{'class':'title'})
title= title_tag.text[1:-1] if title_tag else 'N/A'
date_tag = contents.find('span',{'class':'meta'})
date = date_tag.text if date_tag else 'N/A'
hiv={title : date}
print(' title : ', title ,' \n date : ' ,date )
url_tag = soup.find('div',{'class':'pagination'})
if url_tag.get('href'):
url = 'https://timesofindia.indiatimes.com/' + url_tag.get('href')
print(url)
else:
break
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['title' ,'date'])
pandas is updated to version 0.23.4,then also error occurs.
Upvotes: 0
Views: 57
Reputation: 28630
The first thing I noticed is the construction of the dictionary is off. I'm assuming you want the dictionary of the entire title:date. The way as you have it now, will only keep the last.
Then when you do that, the index of the dataframe with be the key, and the values are the series/column. So technically there's only 1 column. I can create the two columns by resetting the index, then that index is put into a column that I rename 'title'
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://timesofindia.indiatimes.com/topic/Hiv'
response=requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
content = soup.find_all('div',{'class': 'content'})
hiv = {}
for contents in content:
title_tag = contents.find('span',{'class':'title'})
title= title_tag.text[1:-1] if title_tag else 'N/A'
date_tag = contents.find('span',{'class':'meta'})
date = date_tag.text if date_tag else 'N/A'
hiv.update({title : date})
print(' title : ', title ,' \n date : ' ,date )
hiv1 = pd.DataFrame.from_dict(hiv , orient = 'index' , columns = ['date'])
hiv1 = hiv1.rename_axis('title').reset_index()
Output:
print (hiv1)
title date
0 I told my boyfriend I was HIV positive and thi... 01 Dec 2018
1 Pay attention to these 7 very common HIV sympt... 30 Nov 2018
2 Transfusion of HIV blood: Panel seeks time til... 2019-01-06T03:54:33Z
3 No. of pregnant women testing HIV+ dips; still... 01 Dec 2018
4 Busted:5 HIV AIDS myths 30 Nov 2018
5 Myths and taboos related to AIDS 01 Dec 2018
6 N/A N/A
7 Mumbai: Free HIV tests at six railway stations... 23 Nov 2018
8 HIV blood tranfusion: Tamil Nadu govt assures ... 2019-01-05T09:05:27Z
9 Autopsy performed on HIV+ve donor’s body at GRH 2019-01-03T07:45:03Z
10 Madras HC directs to videograph HIV+ve donor’s... 2019-01-01T01:23:34Z
11 HIV +ve Tamil Nadu teen who attempted suicide ... 2018-12-31T03:37:56Z
12 Another woman claims she got HIV-infected blood 2018-12-31T06:34:32Z
13 Another woman says she got HIV from donor blood 29 Dec 2018
14 HIV case: Five-member panel begins inquiry in ... 29 Dec 2018
15 Pregnant woman turns HIV positive after blood ... 26 Dec 2018
16 Pregnant woman contracts HIV after blood trans... 26 Dec 2018
17 Man attacks niece born with HIV for sleeping i... 16 Dec 2018
18 Health ministry implements HIV AIDS Act 2017: ... 11 Sep 2018
19 When meds don’t heal: HIV+ kids fight daily wa... 03 Sep 2018
I'm not quite sure why you're getting the error though. It doesn't make sense since you are using updated Pandas. Maybe uninstall Pandas and then re pip install it?
Otherwise I guess you could just do it in 2 lines and name the columns after converting to dataframe:
hiv1 = pd.DataFrame.from_dict(hiv, orient = 'index').reset_index()
hiv1.columns = ['title','date']
Upvotes: 1