Ayman hebat
Ayman hebat

Reputation: 9

Single list to df multiple columns

How to create df with 4 columns with single list like this:

['Dave',  '2008-09-20',  '2020-05-31',  '[email protected]',  'Steve',  '2009-01-23',  '2020-04-30',  '[email protected]',  'Rob', '2007-02-14',  '2020-04-30',  '[email protected]',  'Ryan',  '2010-02-11', '2020-03-10',  '[email protected]']

i use this code but it doesnt work out:

import pandas as pd
df=pd.Series(data[0].splitlines()).str.split(',',expand=True).T.set_index(0).T.dropna()
df

Upvotes: 0

Views: 88

Answers (4)

LevB
LevB

Reputation: 953

It might be worth coming up with column names and giving each person an ID before you create the dataframe. The good news is that once that's done, you don't need any loops here, making the conversion efficient. The pivot function will give each person their own row.

import pandas as pd
lst =['Dave',  '2008-09-20',  '2020-05-31',   '[email protected]',  'Steve',  '2009-01-23',  '2020-04-30',  '[email protected]',  'Rob', '2007-02-14',  '2020-04-30',  '[email protected]',  'Ryan',  '2010-02-11', '2020-03-10',  '[email protected]']

row_num = len(lst)//4
cols = ['name','start_date','end_date','email']*row_num
ids = sorted([1,2,3,4]*row_num)

df = pd.DataFrame([ids,cols,lst]).T.pivot(index=0,columns=1)[2]

Upvotes: 0

I highly recommend just use the most basic way to do things.

import pandas as pd
arr = ['Dave',  '2008-09-20',  '2020-05-31',  '[email protected]',  'Steve',  '2009-01-23',  '2020-04-30',  '[email protected]',  'Rob', '2007-02-14',  '2020-04-30',  '[email protected]',  'Ryan',  '2010-02-11', '2020-03-10',  '[email protected]']
mat = []
for x in range(0,len(arr),4):
    mat.append(arr[x:x+4])
print(pd.DataFrame(mat))

Upvotes: 0

cs95
cs95

Reputation: 402814

You can use numpy to reshape the array before loading it:

pd.DataFrame(np.array(lst).reshape(-1, 4))

       0           1           2                3
0   Dave  2008-09-20  2020-05-31  [email protected]
1  Steve  2009-01-23  2020-04-30  [email protected]
2    Rob  2007-02-14  2020-04-30    [email protected]
3   Ryan  2010-02-11  2020-03-10   [email protected]

Upvotes: 4

dimay
dimay

Reputation: 2804

Try this:

import pandas as pd
lst = ['Dave',  '2008-09-20',  '2020-05-31',  '[email protected]',  'Steve',  '2009-01-23',  '2020-04-30',  '[email protected]',  'Rob', '2007-02-14',  '2020-04-30',  '[email protected]',  'Ryan',  '2010-02-11', '2020-03-10',  '[email protected]']
df = pd.DataFrame([lst[i:i+4] for i in range(0,len(lst),4)])
print(df)

Output

     0           1          2           3
0   Dave    2008-09-20  2020-05-31  [email protected]
1   Steve   2009-01-23  2020-04-30  [email protected]
2   Rob     2007-02-14  2020-04-30  [email protected]
3   Ryan    2010-02-11  2020-03-10  [email protected]

Upvotes: 1

Related Questions