pandas dataframe - how to extract particular values inside json object

My json looks like below:

json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]

df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','conf_score','number']]

I am extracting only the above info. But now i wanted to ignore 'other': 'Not found' in the extracted_value column and extract like above values.

Upvotes: 0

Views: 188

Answers (1)

simpleApp
simpleApp

Reputation: 3158

you can try df['extracted_value'].apply(remove_other) i.e apply a function on column extracted_value.

complete code will be:

json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','number']]

def remove_other(my_dict):
    return {e:my_dict[e]  for e in my_dict if  e != 'other' and my_dict[e] != 'Not Found' } # condition to remove other and not found pair
    
df['extracted_value']=df['extracted_value'].apply(remove_other)

and the result will be:

extracted_value                                        page_num number
0   {'sound': 'false', 'longterm': 'false', 'physi...   33      12223611

additional response:

  1. df['extracted_value'].apply(remove_other) implies that column value will be passed as a parameter to the function. you can put print statement print(my_dict) in the remove_other to visualize it better.

  2. code can be changed to remove dictionary value from and condition.

def remove_other(my_dict):
    return {e:my_dict[e]  for e in my_dict if  e != 'other' }#and my_dict[e] != 'Not Found' } # remove'other' key item 
    

i would suggest getting familiarized with JSON. in this case , need to go to [0]['coord'][0] . so function will be like :

# Section_Page_start and Section_End_Page
def get_start_and_end(var1):
    my_dict=var1[0]['coord'][0]
    return {ek:my_dict[ek] for ek in my_dict if ek in ['Section_Page_start','Section_End_Page']}

Upvotes: 1

Related Questions