Reputation: 233
I have used BeautifulSoup to collect a 2D data array from a website as a string. I believe that the table format is related to the json format, however, when I tried to apply pandas.read_json() on the string it gives a value error. I tried converting the "nul" to "0" and remove the "\n" from the string to no avail.
data_str = """[[{label:'column 1',type:'number'},{label:'column2',type:'number'},{label:'column 3',type:'number'}],
[205, null, 89748],
[206, null, 66813],
[235, 75, null],
[236, 138, null]]"""
I can convert the string to a pandas DataFrame by splitting the first row of the table containing the column names from the data entries, but this seems rather clumsy (see below).
import numpy as np
import pandas as pd
import ast
col_names, data_str = data_str.split('\n',1)
col_names = re.findall(r'label:\'(.*?)\'', col_names)
data_str = data_str.replace('\n','')
data_str = data_str.replace('null','0.')
data_arr = np.array(ast.literal_eval('[' + data_str))
data_df = pd.DataFrame(data_arr, columns = col_names)
Is there a more pythonic way to convert the string to a pandas DataFrame?
Upvotes: 0
Views: 187
Reputation: 833
No, It's not a valid JSON but a javascript object as raw string. You need install another module like demjson. see the answer here for more detais.
Upvotes: 1