Zumplo
Zumplo

Reputation: 160

Google cloud NL API data to Pandas Dataframe

I‘m using Google NL API (sample_classify_text) It's sending me data that I transformed into this format:

response_list = [[['a', 'b', 'c'], [1,2,3], ['url1']], [['d'], [4], ['url2']]]

From here I'd like to build a Pandas df that looks like this:

a b c 1 2 3 url1
d     4     url2

Knowing that the number of results for each url is different (a,b,c = 3 results, d = 1 result) It seems that most of the time number of results < 4 but I'm not sure about this, so I'd like to keep it flexible.

I've tried a few things, but it gets pretty complicated. I'm wondering if there's an easy way to handle that?

Upvotes: 0

Views: 52

Answers (2)

Zumplo
Zumplo

Reputation: 160

That's what I ended up doing. Not the most elegant solution... Please don't tell me this can be done with a one-liner :D

import pandas as pd

response_list = [[['a', 'b', 'c'], [1,2,3], ['url1']], [['d'], [4], ['url2']]]

colum_0, colum_1, colum_2, colum_3, colum_4, colum_5, colum_6 = [None],[None],[None],[None],[None],[None],[None] #pour crer les colonnes

for main_list in response_list:
    for idx_macro, sub_list in enumerate(main_list):  
        for idx, elem in enumerate(sub_list):
            if idx_macro == 0:  
                if idx == 0:
                   colum_0.append(elem)
                if idx == 1:
                   colum_1.append(elem) 
                if idx == 2:
                   colum_2.append(elem)                
            elif idx_macro == 1: 
                if idx == 0:
                   colum_3.append(elem)                
                if idx == 1:
                   colum_4.append(elem)   
                if idx == 2:
                   colum_5.append(elem)  
            elif idx_macro == 2: 
                   colum_6.append(elem)    

colum_lists = [colum_0, colum_1, colum_2, colum_3, colum_4, colum_5, colum_6]

longest_list = 3 
colum_lists2 = []
for lst in colum_lists[:-1]: #skip urls
    while len(lst) < longest_list:
        lst.append(None)
    colum_lists2.append(lst) 

colum_lists2.append(colum_6) #add urls
    
df = pd.DataFrame(colum_lists2)
df = df.transpose() 
df = df.drop(0)

display(df)

Upvotes: 0

Kim
Kim

Reputation: 336

Have you tried creating a Pandas DF directly from the list?

Such like:

    import pandas as pd
    response_list = [[['a', 'b', 'c'], [1,2,3], ['url1']], [['d'], [4], ['url2']]]
    df = pd.DataFrame(response_list)

The result of the print(df) is:

               0          1       2
    0  [a, b, c]  [1, 2, 3]  [url1]
    1        [d]        [4]  [url2]

Upvotes: 1

Related Questions