Reputation: 15
I have used the following code with the unique() function in pandas to create a column which then contains a list of unique values:
import pandas as pd
from collections import OrderedDict
dct = OrderedDict([
('referencenum',['10','10','20','20','20','30','30','40']),
('Month',['Jan','Jan','Jan','Feb','Feb','Feb','Feb','Mar']),
('Category',['good','bad','bad','bad','bad','good','bad','bad'])
])
df = pd.DataFrame.from_dict(dct)
This gives the following sample dataset:
referencenum Month Category
0 10 Jan good
1 10 Jan bad
2 20 Jan bad
3 20 Feb bad
4 20 Feb bad
5 30 Feb good
6 30 Feb bad
7 40 Mar bad
Then I summarise as follows:
dfsummary = pd.DataFrame(df.groupby(['referencenum', 'Month'])['Category'].unique())
dfsummary.reset_index()
To give the summary dataframe with "Category" column containing a list
referencenum Month Category
0 10 Jan [good, bad]
1 20 Feb [bad]
2 20 Jan [bad]
3 30 Feb [good, bad]
4 40 Mar [bad]
My question is how do I obtain another column containing the len() or number of items in the Category "list" column?
Also - how do extract the first/ second item in the list to another column?
Can I do these manipulations within pandas or do I somehow need to drop out to list manipulations and then come back to pandas?
Many thanks!
Upvotes: 1
Views: 1774
Reputation: 599
If you want to get the number of elements of each entry in Category
column, you should use len()
method with apply()
:
dfsummary['Category_len'] = dfsummary['Category'].apply(len)
Upvotes: 0
Reputation: 19885
You should check out the accessors.
Basically, they're ways to handle the values contained in a Series that are specific to their type (datetime, string, etc.).
In this case, you would use df['Category'].str.len()
.
If you wanted the first element, you would use df['Category'].str[0]
.
To generalise: you can treat the elements of a Series as a collection of objects by referring to its .str
property.
Upvotes: 1