Reputation: 761
I'm having a pandas issue.
So this is my DataFrame:
user page_number page_parts_of_speech
Anne 1 [('Hi', NP), ('my', PP), ('name', NN), ('is', VB), ('Anne', NP)]
John 2 [('Hi', NP), ('my', PP), ('name', NN), ('is', VB), ('John', NP)]
And I want to add a new column, called set_of_parts_of_speech
, which contains a set that contains all words in the parts_of_speech
column that are tuppled together with an NP.
A sample output would be:
user page_number page_parts_of_speech set_of_parts_of_speech
Anne 1 [('Hi', NP), ('my', PP), ['Hi', 'Anne']
('name', NN), ('is', VB), ('Anne', NP)]
John 2 [('Hi', NP), ('my', PP), ['Hi', 'John']
('name', NN), ('is', VB), ('John', NP)]
It is really important that the set_of_parts_of_speech column contains an actual set.
Any help on this issue will be highly appreciated.
Upvotes: 1
Views: 40
Reputation: 862511
Use apply
with list comprehension for filtering by condition:
print (type(df.loc[0, 'page_parts_of_speech']))
<class 'list'>
f = lambda x: set([y[0] for y in x if y[1] == 'NP'])
df['set_of_parts_of_speec'] = df['page_parts_of_speech'].apply(f)
print (df)
user page_number page_parts_of_speech \
0 Anne 1 [(Hi, NP), (my, PP), (name, NN), (is, VB), (An...
1 John 2 [(Hi, NP), (my, PP), (name, NN), (is, VB), (Jo...
set_of_parts_of_speec
0 {Hi, Anne}
1 {Hi, John}
Upvotes: 2