Sumit Singh Chouhan
Sumit Singh Chouhan

Reputation: 1

Transform single Json columns' all rows into multiple columns

df has the following columns:

Index(

['level_id', 'description', 'level_type', 'difficulty', 'solution', 'modifier', 'last_modified', 'time_limit', 'tags', 'level_options', 'extra_info', 'hint']

, dtype='object')

column that i am converting is ="level_options"

Down below is the data inside first row of level_options:

{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]}

Then in the below code: I manually pass the above first row, got the expected output is well.

>     j2=[{"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}]
>     pd.json_normalize(j2)

output:

marking   max_score     random_options   correct_options      options.1   options.2     options.3   options.4  
normal      1             false            [2]       Carbon dioxide Oxygen         Urine        Sweat

But while iterating through all rows I don't get the output.
I have used this below code where I was supposed to get 6 lists but got empty lists.PFA of expected output- final output

Option1=[] 
Option2=[] 
Option3=[] 
Option4=[] 
Option5=[]
Correct_option=[]
 try:
>     for index, row in df.iterrows():
>         l=row["level_options"]
>         print(l) except:
>     print(index,row)
>     l = json.loads(l) 
>     print(l)
>     for i in l:
>         i = pd.normalize_json(i) 
>         print(i)
>         if i['options'] == '1':
>             Option1.append(i['value'])
>         elif i['options'] == '2':
>             Option2.append(i['value'])
>         elif i['options'] == '3':
>             Option3.append(i['value'])
>         elif i['options'] == '4':
>             Option4.append(i['value'])
>         elif i['options'] == '5':
>             Option5.append(i['value'])
>         elif i['options'] == 'correct_options':
>             Correct_option.append(i['value'])

i even used the json.loads instead of json_normalize but error occurs

below are 2 rows of dataframe= df["level_options"]

{"marking": "normal", "options": {"1": "Carbon dioxide", "2": "Oxygen", "3": "Urine", "4": "Sweat"}, "max_score": 1, "random_options": "false", "correct_options": ["2"]} {"marking": "normal", "options": {"1": "Yes", "2": "No"}, "max_score": 1, "random_options": "false", "correct_options": ["1"]}

The Output shown below is what I am expecting after merging each list as a single dataframe, the problem I am getting in is while retrieving the lists...

Thanks and regards

options.1   options.2     options.3 options.4  options.5    correct_options    
Carbon dioxide  Oxygen      Urine      Sweat       NaN               [2]    
Yes              No           NaN        NaN        NaN               [1]  

Upvotes: 0

Views: 122

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

Just try pd.json_normalize on column

out = (pd.json_normalize(df['level_options'])
       .filter(like='option')
       .drop('random_options', axis=1))
print(out)

  correct_options       options.1 options.2 options.3 options.4
0             [2]  Carbon dioxide    Oxygen     Urine     Sweat
1             [1]             Yes        No       NaN       NaN

Upvotes: 1

Related Questions