Zac
Zac

Reputation: 323

convert/clean scraped data to pandas dataframe python

Hi there i have scraped data from website that is as below:

"header1","header2","header3","header4","header5":"value1-1","value1-2","value1-3","value1-4":"value2-1","   value2-2","   value2-3"," value2-4":

The raw data has double quotes and white spaces in between the value which i want to remove and I want to convert the data extracted from website into pandas dataframe as below: **Note the row ends after colon (:) in raw data

 header1    header2    header3    header4    header5
value1-1   value1-2   value1-3   value1-4   value1-5
value2-1   value2-2   value2-3   value2-4   value2-5

Please suggest me some easy fix for this

Upvotes: 1

Views: 80

Answers (2)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Use lineterminator argument

pd.read_csv(filepath, sep=',', lineterminator=':')

OR

For text based input, as suggested by cs95,

from io import StringIO
pd.read_csv(StringIO(text), sep=',', lineterminator=':')

Upvotes: 2

anky
anky

Reputation: 75080

Assuming you have saved the string as variable s try:

a = s.split(":")
b = [i.split(",") for i in a if len(i)>0]
output_df = pd.DataFrame(b[1:],columns=b[0])

Upvotes: 1

Related Questions