Nested list to list into new column

data = {
    'date': ['2020-04-27', '2020-04-27', '2020-04-27'],
    'user': ['Steeve', 'Pam', 'Olive'],
    'mentions': ["['sport', 'basket']", "['politique']", "[]"],
    'reply_to': [
        "[{'user_id': '123', 'username': 'aaa'}, {'user_id': '234', 'username': 'bbb'}, {'user_id': '456', 'username': 'ccc'}]",
        "[{'user_id': '567', 'username': 'zzz'}, {'user_id': '458', 'username': 'vfd'}]",
        "[{'user_id': '666', 'username': 'ggg'}]"],
    'text': ['textfromSteeve', 'textfromPam', 'textfromOlive']
}

stack = pd.DataFrame(data, columns=['date', 'user','mentions','reply_to','text'])

From this dataframe, I'm trying to convert both mentions and reply_to columns into nested list. The goal is then to apply an pandas explode function to display one row for each number of mentions. For instance, I'd like 3 rows of user 'Pam' with one mention for each line (Steeve, Olive and Marc).

So far, I've done the following:

def nested_list(li):
    temp = []
    for elem in li:
        temp.append([elem]) 
    return temp
stack['mentions_nested= stack.mentions.apply(lambda x: nested_list(x))
stack['replies_nested= stack.reply_to.apply(lambda x: nested_list(x))

The problem is when there is only only one name (string) in the column. It splits each letter into a distinct list (ex: [[P], [a], [m]]).

Regarding to the reply_to column, where the dictionary's length is equal to 1, it returns something like this: [[id],[username]].

Do you guys have any idea on how I could do this?

FYI: I'm not going to apply the explode function on both mentions an reply_to columns on the meantime. This is going to be two different process.python

Upvotes: 0

Views: 1637

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

use json normalize on the reply to column

from pandas import json_normalize

res = pd.concat([json_normalize(ent)
                .assign(index=ind)
                 for ind,ent
                in zip(stack.index,stack['reply_to'])
                ])

res

     id username    index
0   123     alpha   0
0   231     beta    1
1   4580    omega   1
0   789     olo     2

join back to the original dataframe, with some cleanups

  (stack
   .join(res.set_index('index'))
   .explode('mentions')
   .drop('reply_to',axis=1)
   )

    user    mentions    text         id    username
0   Steeve  Pam     textfromSteeve  123     alpha
1   Pam    Steeve   textfromPam     231     beta
1   Pam     Olive   textfromPam     231     beta
1   Pam     Marc    textfromPam     231     beta
1   Pam     Steeve  textfromPam     4580    omega
1   Pam     Olive   textfromPam     4580    omega
1   Pam     Marc    textfromPam     4580    omega
2   Olive   Paul    textfromOlive   789     olo
2   Olive   Lou     textfromOlive   789     olo

Upvotes: 0

jezrael
jezrael

Reputation: 862601

I believe you need replace non list values to lists with map and isinstance:

for c in ['mentions','reply_to']:
    stack[c] = stack[c].map(lambda x: x if isinstance(x, list) else [x])

print (stack)
     user               mentions  \
0  Steeve                  [Pam]   
1     Pam  [Steeve, Olive, Marc]   
2   Olive            [Paul, Lou]   

                                            reply_to            text  
0               [{'id': '123', 'username': 'alpha'}]  textfromSteeve  
1  [{'id': '231', 'username': 'beta'}, {'id': '45...     textfromPam  
2                 [{'id': '789', 'username': 'olo'}]   textfromOlive  

Then is possible create dictionaries in list comprehension with assign key from index values, pass to DataFrame and last use DataFrame.join for original, last use DataFrame.explode:

L = [dict(**{'idx':k}, **x) for 
     k, v in stack.pop('reply_to').items() 
     for x in v]

df = pd.DataFrame(L).join(stack, on='idx').explode('mentions').reset_index(drop=True)
print (df)
   idx    id username    user mentions            text
0    0   123    alpha  Steeve      Pam  textfromSteeve
1    1   231     beta     Pam   Steeve     textfromPam
2    1   231     beta     Pam    Olive     textfromPam
3    1   231     beta     Pam     Marc     textfromPam
4    1  4580    omega     Pam   Steeve     textfromPam
5    1  4580    omega     Pam    Olive     textfromPam
6    1  4580    omega     Pam     Marc     textfromPam
7    2   789      olo   Olive     Paul   textfromOlive
8    2   789      olo   Olive      Lou   textfromOlive

Upvotes: 1

Related Questions