Si_CPyR
Si_CPyR

Reputation: 161

Python List to Dataframe with conditions

I have a long list (sample below)

df_list = ['Joe',
 'UK',
 'Buyout',
 '10083',
 '4323',
 'http://info2.com',
 'Linda',
 'US',
 'Liquidate',
 '97656',
 '1223',
 'http://global.com',
 '[email protected]'           
          ]

As you can see, the list contains information about an individual (Joe and Linda's). However, the problem is that for some observations (Joe in this example), I am missing 7th element, which corresponds to the entity's email address, because for Linda, we do have this person's email, thus populated.

I want to turn this list into a dataframe with 7 columns (below), and for observations that do not have a valid email address (does not contain "@"), I want to put Null/empty values, rather than the next element, which would be the next observation's NAME column for email column.

cols = ['NAME'
,'COUNTRY'
,'STRATEGIES'
,'TOTAL FUNDS'
,'ESTIMATED PAYOFF'
,'WEBSITE'
,'EMAIL']

So far, this is where I am at

big_list = []  #intention is to append N (number of unique entity) small_lists into a big_list and call pd.DataFrame(big_list)
small_list = [] #intention is to create a small_list for each observation/entity, containing 7 values, including email or null if empty
for element in df_list:
    small_list.append(element)
if ("@" not in small_list):
    small_list[-1] = None

Any help would be highly appreciated! Thanks

Upvotes: 2

Views: 83

Answers (2)

kederrac
kederrac

Reputation: 17322

you could use a generator:

def gen_batch(df_list):
    i = 6
    while i <= len(df_list):
        if i < len(df_list) and '@' in df_list[i]:
            yield df_list[i-6: i+1] 
            i += 7
        else:
            yield df_list[i-6: i] + [pd.np.NAN]
            i += 6

pd.DataFrame(gen_batch(df_list), columns=cols)  

output: enter image description here

Upvotes: 1

Sociopath
Sociopath

Reputation: 13401

IIUC you need:

new_list = []
counter = 0
while True:
    try:
        if "@" not in df_list[counter+6]:
            new_list.append(df_list[counter:counter+6])
            counter += 6
        else:
            new_list.append(df_list[counter:counter+7])
            counter += 7
    except IndexError:
        break


df = pd.DataFrame(new_list, columns=cols)

print(df)

Output:

    NAME COUNTRY STRATEGIES TOTAL FUNDS ESTIMATED PAYOFF            WEBSITE  \
0    Joe      UK     Buyout       10083             4323   http://info2.com   
1  Linda      US  Liquidate       97656             1223  http://global.com   

              EMAIL  
0              None  
1  [email protected]

Upvotes: 1

Related Questions