emrepun
emrepun

Reputation: 2666

Convert DataFrame with url in string format to JSON properly

I have a data frame with 2 columns, one of which consists of URLs.

Sample code:

df = pd.DataFrame(columns=('name', 'image'))
df = df.append({'name': 'sample_name', 'image': 'https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500'}, ignore_index=True)
df = df.append({'name': 'sample_name2', 'image': 'https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909'}, ignore_index=True)

I want to convert this dataframe to JSON directly. I've used to_json() method of DataFrame to convert, but when I do it, it kind of messes up the urls in the data frame.

Conversion to JSON:

json = df.to_json(orient='records')

When I print it, the conversion inserts '\' character to beginning of every '/' character in my url.

print(json)

Result:

[{"name":"sample_name","image":"https:\/\/images.pexels.com\/photos\/736230\/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
me":"sample_name2","image":"https:\/\/cdn.theatlantic.com\/assets\/media\/img\/mt\/2017\/10\/Pict1_Ursinia_calendulifolia\/lead_720_405.jpg?mod=15
33691909"}]

I want the json to look like (no extra '\' in urls):

[{"name":"sample_name","image":"https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"},{"na
    me":"sample_name2","image":"https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=15
    33691909"}]

I checked documentation of to_json() and other questions as well but couldn't find an answer to deal with it. How can I just convert my url strings to json, as they are in data frame?

Upvotes: 7

Views: 1753

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477230

Pandas uses ujson [PiPy] internally to encode the data to a JSON blob. ujson by default escapes slashes with the escape_forward_slashes option.

You can just json.dumps(…) the result of converting your dataframe to a dictionary with .to_dict:

>>> import json
>>> print(json.dumps(df.to_dict('records')))
[{"name": "sample_name", "image": "https://images.pexels.com/photos/736230/pexels-photo-736230.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500"}, {"name": "sample_name2", "image": "https://cdn.theatlantic.com/assets/media/img/mt/2017/10/Pict1_Ursinia_calendulifolia/lead_720_405.jpg?mod=1533691909"}]

Upvotes: 7

Related Questions