Reputation: 9
How to create df with 4 columns with single list like this:
['Dave', '2008-09-20', '2020-05-31', '[email protected]', 'Steve', '2009-01-23', '2020-04-30', '[email protected]', 'Rob', '2007-02-14', '2020-04-30', '[email protected]', 'Ryan', '2010-02-11', '2020-03-10', '[email protected]']
i use this code but it doesnt work out:
import pandas as pd
df=pd.Series(data[0].splitlines()).str.split(',',expand=True).T.set_index(0).T.dropna()
df
Upvotes: 0
Views: 88
Reputation: 953
It might be worth coming up with column names and giving each person an ID before you create the dataframe. The good news is that once that's done, you don't need any loops here, making the conversion efficient. The pivot function will give each person their own row.
import pandas as pd
lst =['Dave', '2008-09-20', '2020-05-31', '[email protected]', 'Steve', '2009-01-23', '2020-04-30', '[email protected]', 'Rob', '2007-02-14', '2020-04-30', '[email protected]', 'Ryan', '2010-02-11', '2020-03-10', '[email protected]']
row_num = len(lst)//4
cols = ['name','start_date','end_date','email']*row_num
ids = sorted([1,2,3,4]*row_num)
df = pd.DataFrame([ids,cols,lst]).T.pivot(index=0,columns=1)[2]
Upvotes: 0
Reputation: 1
I highly recommend just use the most basic way to do things.
import pandas as pd
arr = ['Dave', '2008-09-20', '2020-05-31', '[email protected]', 'Steve', '2009-01-23', '2020-04-30', '[email protected]', 'Rob', '2007-02-14', '2020-04-30', '[email protected]', 'Ryan', '2010-02-11', '2020-03-10', '[email protected]']
mat = []
for x in range(0,len(arr),4):
mat.append(arr[x:x+4])
print(pd.DataFrame(mat))
Upvotes: 0
Reputation: 402814
You can use numpy to reshape the array before loading it:
pd.DataFrame(np.array(lst).reshape(-1, 4))
0 1 2 3
0 Dave 2008-09-20 2020-05-31 [email protected]
1 Steve 2009-01-23 2020-04-30 [email protected]
2 Rob 2007-02-14 2020-04-30 [email protected]
3 Ryan 2010-02-11 2020-03-10 [email protected]
Upvotes: 4
Reputation: 2804
Try this:
import pandas as pd
lst = ['Dave', '2008-09-20', '2020-05-31', '[email protected]', 'Steve', '2009-01-23', '2020-04-30', '[email protected]', 'Rob', '2007-02-14', '2020-04-30', '[email protected]', 'Ryan', '2010-02-11', '2020-03-10', '[email protected]']
df = pd.DataFrame([lst[i:i+4] for i in range(0,len(lst),4)])
print(df)
Output
0 1 2 3
0 Dave 2008-09-20 2020-05-31 [email protected]
1 Steve 2009-01-23 2020-04-30 [email protected]
2 Rob 2007-02-14 2020-04-30 [email protected]
3 Ryan 2010-02-11 2020-03-10 [email protected]
Upvotes: 1