Reputation: 323
Hi there i have scraped data from website that is as below:
"header1","header2","header3","header4","header5":"value1-1","value1-2","value1-3","value1-4":"value2-1"," value2-2"," value2-3"," value2-4":
The raw data has double quotes and white spaces in between the value which i want to remove and I want to convert the data extracted from website into pandas dataframe as below: **Note the row ends after colon (:) in raw data
header1 header2 header3 header4 header5
value1-1 value1-2 value1-3 value1-4 value1-5
value2-1 value2-2 value2-3 value2-4 value2-5
Please suggest me some easy fix for this
Upvotes: 1
Views: 80
Reputation: 10960
Use lineterminator
argument
pd.read_csv(filepath, sep=',', lineterminator=':')
OR
For text based input, as suggested by cs95,
from io import StringIO
pd.read_csv(StringIO(text), sep=',', lineterminator=':')
Upvotes: 2
Reputation: 75080
Assuming you have saved the string as variable s
try:
a = s.split(":")
b = [i.split(",") for i in a if len(i)>0]
output_df = pd.DataFrame(b[1:],columns=b[0])
Upvotes: 1