Reputation: 35696
In the following Pandas DataFrame
,
df = pd.DataFrame({'session' : ["1","1","2","2","3","3"],
'path' : ["p1","p2","p1","p2","p2","p3"], 'seconds' : ["20","21","132","10","24","45"]})
I need to get an output like the following. (Pages as columns, sessions as rows and seconds in each cell.)
session,p1,p2,p3
1,20,21,0
2,132,10,0
3,0,24,45
What I have done so far.
In [76]: wordlist = ['p1', 'p2', 'p3']
In [77]: df2 = pd.DataFrame(df.groupby('session').apply(lambda x: ','.join(x.path)))
In [78]: df2 #I have renamed the columns
Out[78]:
path
session
1 p1,p2
2 p1,p2
3 p2,p3
In [79]: df3 = pd.DataFrame(df.groupby('session').apply(lambda x: ','.join(x.seconds.astype(str))))
In [80]: df3 #I have renamed the columns
Out[80]:
path
session
1 20,21
2 132,10
3 24,45
The following just gives the boolean result. I need to get my expected output. Any help on this?
In [84]: pd.DataFrame({name : df2["path"].str.contains(name) for name in wordlist})
Out[84]:
p1 p2 p3
session
1 True True False
2 True True False
3 False True True
Upvotes: 1
Views: 79
Reputation: 6562
Use a pivot table:
df.pivot(index='session', columns='path')
Then replace all the Nan's with zeros:
df2 = df1.fillna(0)
This gives you the following output:
seconds
path p1 p2 p3
session
1 20 21 0
2 132 10 0
3 0 24 45
Then you might want to drop the multiindex column:
df1.columns = df1.columns.droplevel(0)
Yielding your desired solution (sans commas):
path p1 p2 p3
session
1 20 21 0
2 132 10 0
3 0 24 45
Finally you can convert it to comma-separated string using StringIO
:
import StringIO
s = StringIO.StringIO()
df1.to_csv(s)
print s.getvalue()
With the following output:
session,p1,p2,p3
1,20,21,0
2,132,10,0
3,0,24,45
Upvotes: 2