yudhiesh
yudhiesh

Reputation: 6799

How can I remove some keys from a JSON string where the values are preventing me from parsing it as JSON?

I have this string coming in from a HTTP request:

s = "{'id': 81, 'udate': datetime.datetime(2021, 2, 3, 7, 20, 5, 369376, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)), 'cdate': datetime.datetime(2021, 3, 11, 9, 50, 0, 984521, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)), 'screen_name': 'Hellas Utrecht', 'follower_id': '310489102', 'is_unfollow': True, 'user_id': 8, 'follower_description': 'Atletiekvereniging Hellas Utrecht heeft zo’n 1600 leden, verdeeld over de afd. jeugd-, weg- en baanatletiek, recreatie en triathlon.', 'follower_favourites_count': '675', 'follower_followers_count': '741', 'follower_listed_count': '9', 'follower_location': 'Utrecht', 'follower_screen_name': 'HellasUtrecht', 'follower_statuses_count': '904'}"

I need to convert it to a dictionary but the keys udate and cdate are preventing me from doing so as they are in the form of a function i.e. datetime.datetime(2021, 2, 3, 7, 20, 5, 369376, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)) which throws an error malformed node or string: <_ast.Call object at 0x109346890>.

Currently my solution is to just manually convert the string to a dictionary by excluding the strings by indexing(but this excludes the id):

import ast

ast.literal_eval('{'+s[250:])
>>>{'screen_name': 'Hellas Utrecht', 'follower_id': '310489102', 'is_unfollow': True, 'user_id': 8, 'follower_description': 'Atletiekvereniging Hellas Utrecht heeft zo’n 1600 leden, verdeeld over de afd. jeugd-, weg- en baanatletiek, recreatie en triathlon.', 'follower_favourites_count': '675', 'follower_followers_count': '741', 'follower_listed_count': '9', 'follower_location': 'Utrecht', 'follower_screen_name': 'HellasUtrecht', 'follower_statuses_count': '904'}

But I am wondering if there is a better way to do so using regex? I just need the keys udate,cdate and their values to be removed.

Expected output:

"{'id': 81, 'screen_name': 'Hellas Utrecht', 'follower_id': '310489102', 'is_unfollow': True, 'user_id': 8, 'follower_description': 'Atletiekvereniging Hellas Utrecht heeft zo’n 1600 leden, verdeeld over de afd. jeugd-, weg- en baanatletiek, recreatie en triathlon.', 'follower_favourites_count': '675', 'follower_followers_count': '741', 'follower_listed_count': '9', 'follower_location': 'Utrecht', 'follower_screen_name': 'HellasUtrecht', 'follower_statuses_count': '904'}"

Upvotes: 2

Views: 56

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can remove these two keys with their values using

s = re.sub(r"'[uc]date':\s*datetime\.datetime\([^()]+\([^()]*\)\)\s*,?", '', s)

See the regex demo. Details:

  • '[uc]date': - 'udate': or 'cdate':
  • \s* - zero or more whitespaces
  • datetime\.datetime\( - a datetime.datetime( string
  • [^()]+ - zero or more chars other than ( and )
  • \( - a ( char
  • [^()]+ - one or more chars other than ( and )
  • \)\) - a )) string
  • \s* - zero or more whitespaces
  • ,? - an optional comma.

Then, you may use ast.literal_eval on the result, see the Python demo:

import re, ast
s = "{'id': 81, 'udate': datetime.datetime(2021, 2, 3, 7, 20, 5, 369376, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)), 'cdate': datetime.datetime(2021, 3, 11, 9, 50, 0, 984521, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)), 'screen_name': 'Hellas Utrecht', 'follower_id': '310489102', 'is_unfollow': True, 'user_id': 8, 'follower_description': 'Atletiekvereniging Hellas Utrecht heeft zo’n 1600 leden, verdeeld over de afd. jeugd-, weg- en baanatletiek, recreatie en triathlon.', 'follower_favourites_count': '675', 'follower_followers_count': '741', 'follower_listed_count': '9', 'follower_location': 'Utrecht', 'follower_screen_name': 'HellasUtrecht', 'follower_statuses_count': '904'}"
s = re.sub(r"'[uc]date':\s*datetime\.datetime\([^()]+\([^()]*\)\)\s*,?", '', s)
print( ast.literal_eval(s) )
=> {'id': 81, 'screen_name': 'Hellas Utrecht', 'follower_id': '310489102', 'is_unfollow': True, 'user_id': 8, 'follower_description': 'Atletiekvereniging Hellas Utrecht heeft zo’n 1600 leden, verdeeld over de afd. jeugd-, weg- en baanatletiek, recreatie en triathlon.', 'follower_favourites_count': '675', 'follower_followers_count': '741', 'follower_listed_count': '9', 'follower_location': 'Utrecht', 'follower_screen_name': 'HellasUtrecht', 'follower_statuses_count': '904'}

Upvotes: 1

Related Questions