Reputation: 201
I am running the following code in jupyter notebook which checks strings of text within nametest_df['text']
and returns Persons names. I managed to get this working and would like to push these names to the respective fields within the nametest_df['name']
where currently all values are NaN
.
I tried the Series.replace()
method however all entries within the 'name' column are all showing the same name.
Any clue how I can do this efficiently?
for word in nametest_df['text']:
for sent in nltk.sent_tokenize(word):
tokens = nltk.tokenize.word_tokenize(sent)
tags = st.tag(tokens)
for tag in tags:
if tag[1]=='PERSON':
name = tag[0]
print(name)
nametest_df.name = nametest_df.name.replace({"NaN": name})
Sample nametest_df
**text** **name**
0 His name is John NaN
1 I went to the beach NaN
2 My friend is called Fred NaN
Expected output
**text** **name**
0 His name is John John
1 I went to the beach NaN
2 My friend is called Fred Fred
Upvotes: 0
Views: 145
Reputation: 164673
Don't try and fill series values one by one. This is inefficient prone to error. A better idea is to create a list of names and assign directly.
L = []
for word in nametest_df['text']:
for sent in nltk.sent_tokenize(word):
tokens = nltk.tokenize.word_tokenize(sent)
tags = st.tag(tokens)
for tag in tags:
if tag[1]=='PERSON':
L.append(tag[0])
nametest_df.loc[nametest_df['name'].isnull(), 'name'] = L
Upvotes: 1