Paul
Paul

Reputation: 233

converting 2d json(?) array to pandas dataframe

I have used BeautifulSoup to collect a 2D data array from a website as a string. I believe that the table format is related to the json format, however, when I tried to apply pandas.read_json() on the string it gives a value error. I tried converting the "nul" to "0" and remove the "\n" from the string to no avail.

data_str = """[[{label:'column 1',type:'number'},{label:'column2',type:'number'},{label:'column 3',type:'number'}],
[205, null,  89748],
[206, null,  66813],
[235,   75,   null],
[236,  138,   null]]"""

I can convert the string to a pandas DataFrame by splitting the first row of the table containing the column names from the data entries, but this seems rather clumsy (see below).

import numpy as np
import pandas as pd
import ast

col_names, data_str = data_str.split('\n',1)
col_names = re.findall(r'label:\'(.*?)\'', col_names)
data_str = data_str.replace('\n','')
data_str = data_str.replace('null','0.')

data_arr = np.array(ast.literal_eval('[' + data_str))
data_df = pd.DataFrame(data_arr, columns = col_names)

Is there a more pythonic way to convert the string to a pandas DataFrame?

Upvotes: 0

Views: 187

Answers (1)

SCKU
SCKU

Reputation: 833

No, It's not a valid JSON but a javascript object as raw string. You need install another module like demjson. see the answer here for more detais.

Upvotes: 1

Related Questions