acbcccdc
acbcccdc

Reputation: 77

Python - Creating DataFrame with Nested Dictionary and Unequal Lengths

I searched around for some of the similarly worded questions, but I didn't find a solution, so I am making my own question.

I am using Python. I am trying to create a DataFrame from a nested dictionary (with lists), and I can't get the length of the columns correct.

My data structure is:

[
   {
      "Info":{
         "StudentID":1000009,
         "firstName":"Name",
         "middleName":"Middle",
         "lastName":"Last"
      },
      "Pass":[
         {
            "Subject":"Math",
            "Score":98
         },
         {
            "Subject":"Science",
            "Score":100
         }
      ]
   },
   {
      "Info":{
         "StudentID":1000010,
         "firstName":"Name",
         "middleName":"Middle",
         "lastName":"Last"
      },
      "Pass":[
         {
            "Subject":"Math",
            "Score":90
         },
         {
            "Subject":"Science",
            "Score":82
         },
         {
            "Subject":"English",
            "Score":99
         }
      ]
   }
]

I want the DataFrame to be:

       ID   firstName  middleName lastName   Subject1  Subject2 Subject3
0 1000009        Name      Middle     Last       Math   Science      NaN
0 1000009        Name      Middle     Last       Math   Science  English

The data is more complex (more subjects), but what ends up happening is that I get an error that all arrays must be the same length:

ValueError: All arrays must be of the same length

I tried:

p = defaultdict(list)

num = len(data)

for i in range(0,num):
    crd = data[i]['Info']['individualId']
    p['crd'].append(crd)
    
    firstName = data[i]['Info']['firstName']
    p['firstName'].append(firstName)

    middleName = data[i]['Info']['middleName ']
    p['firstName'].append(middleName)

    lastName = data[i]['Info']['lastName']
    p['lastName'].append(lastName)

    subjects= len(data[i]['Pass'])
   
    for e in range(0,subjects):
        try:
            exam = data[i]['Pass'][e]['Subject']
            p[f'Subject{e}'].append(subject)
        except:
              break

df = pd.DataFrame.from_dict(p, orient='index')
df

But the data isn't aligned - it must be tied to the correct ID. Instead, it lists the info in the order that it appears. In other words, there isn't a missing value in Subject3.

I've also tried creating a list, then creating a new list within the for loop.

rec = []

num = len(data)

for i in range(0,num):
    p = []
    crd = data[i]['Info']['individualId']
    p.append(crd)
    
    firstName = data[i]['Info']['firstName']
    p.append(firstName)

    middleName = data[i]['Info']['middleName ']
    p.append(middleName)

    lastName = data[i]['Info']['lastName']
    p.append(lastName)

    subjects= len(data[i]['Pass'])
   
    for e in range(0,subjects):
        try:
            exam = data[i]['Pass'][e]['Subject']
            p.append(subject)
        except:
              break
     rec.append(p)

df = pd.DataFrame(rec)
df

In this code, I get misaligned info. Each column doesn't have the standard info. For example, if someone doesn't have a middle name in the data, everything will get shifted to the left.

Any solutions?

Upvotes: 0

Views: 527

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

For your data, you can use json_normalize, then a bit of manipulation:

# put data into a frame
tmp = pd.json_normalize(data)

passes = tmp['Pass'].explode()
out = tmp.drop(columns='Pass').join(pd.DataFrame(passes.tolist(), index=passes.index))

Then out is:

   Info.StudentID Info.firstName Info.middleName Info.lastName  Subject  Score
0         1000009           Name          Middle          Last     Math     98
0         1000009           Name          Middle          Last  Science    100
1         1000010           Name          Middle          Last     Math     90
1         1000010           Name          Middle          Last  Science     82
1         1000010           Name          Middle          Last  English     99

From here, you can pivot the Subject

Upvotes: 1

Related Questions