How to turn a nested list with multiple values of different length into a pandas dataframe in python?

Question

I am collecting follower ids (between 1,000 and 25,000 per account) from 50 twitter accounts and was able to store these follower ids in a json in a format similar to this:


[
    36146779,
    [
        170742628,
        3597763396,
        13453212,
        24763726,
        19087188,
        19605181,
        37374972
    ],
    22971125,
    [
        1114702974,
        1145981566365130758,
        1118409958561685504,
        822439041312423941,
        1110524937788424197,
        807718095460581376,
        24763726,
        3181477874,
        1076870147980300288,
        307465302,
    ],
     24763726,
    [........

What I am trying to do is to find all follower ids that are the same, say a person 24763726 both follows accounts 36146779 and 22971125. Any recommendations on how to solve this problem? I am quite new to Python and programming in general and would be very thankful about any help or advice!

So far I was able to turn the saved data (in json format) into a pandas dataframe, but it is not in the form I would like to have it.

import json
import pandas as pd

# Import the data
with open("2019_07_02_eco copy.json", "r", encoding="utf-8") as f:
    data_list = json.load(f)

# Create a pandas DataFrame with the follower ids 
df = pd.DataFrame(data_list)

print(df.head)

What I expected was a pd dataframe with account ids (of the 50 accounts) as column headers and follower ids in the lines below that.

What I got was this:

[194 rows x 1 columns]

ncica · Accepted Answer

try this:

data = [
    36146779,
    [
        170742628,
        3597763396,
        13453212,
        24763726,
        19087188,
        19605181,
        37374972
    ],
    22971125,
    [
        1114702974,
        1145981566365130758,
        1118409958561685504,
        822439041312423941,
        1110524937788424197,
        807718095460581376,
        24763726,
        3181477874,
        1076870147980300288,
        307465302,
    ],
    24763726,
    [
        1145981566365130758,
        1118409958561685504,
        822439041312423941,
        1110524937788424197,
        22971125
    ]
    ]

d = {}
for i in range(0,len(data)-1,2): # convert to dictionary
    d[str(data[i])] = data[i+1]

def getKeys(dictOfElements, valueToFind):
    listOfKeys = list()
    listOfItems = dictOfElements.items()
    for item  in listOfItems:
        if valueToFind in item[1]:
            listOfKeys.append(item[0])
    return  listOfKeys


for key in d.keys():
    keys = ",".join(getKeys(d, int(key)))
    print ("person: {}, follows accounts: {}".format(key, keys))

output:

person: 36146779, follows accounts: 
person: 22971125, follows accounts: 24763726
person: 24763726, follows accounts: 36146779,22971125

How to turn a nested list with multiple values of different length into a pandas dataframe in python?

Answers (2)

Related Questions