Reputation: 85
My json looks like below:
json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','conf_score','number']]
I am extracting only the above info. But now i wanted to ignore 'other': 'Not found' in the extracted_value column and extract like above values.
Upvotes: 0
Views: 188
Reputation: 3158
you can try df['extracted_value'].apply(remove_other)
i.e apply a function on column extracted_value.
complete code will be:
json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','number']]
def remove_other(my_dict):
return {e:my_dict[e] for e in my_dict if e != 'other' and my_dict[e] != 'Not Found' } # condition to remove other and not found pair
df['extracted_value']=df['extracted_value'].apply(remove_other)
and the result will be:
extracted_value page_num number
0 {'sound': 'false', 'longterm': 'false', 'physi... 33 12223611
additional response:
df['extracted_value'].apply(remove_other)
implies that column value will be passed as a parameter to the function. you can put print statement print(my_dict)
in the remove_other
to visualize it better.
code can be changed to remove dictionary value from and condition.
def remove_other(my_dict):
return {e:my_dict[e] for e in my_dict if e != 'other' }#and my_dict[e] != 'Not Found' } # remove'other' key item
i would suggest getting familiarized with JSON. in this case , need to go to [0]['coord'][0] . so function will be like :
# Section_Page_start and Section_End_Page
def get_start_and_end(var1):
my_dict=var1[0]['coord'][0]
return {ek:my_dict[ek] for ek in my_dict if ek in ['Section_Page_start','Section_End_Page']}
Upvotes: 1