Reputation: 121
I want to check the length of the sentence in a long column of a dataframe and return another dataframe of len(sentence)* a replacement word. This is a sample of the sentences whose lengths i want to check.
"But for his attorney's incompetence",
'should there have been more supervision from the parents while their children were in the kitchen',
"If I didn't have the option of makeup (concealer)",
'if she had foregone insurance and printed off a savings coupon from the website GoodRx',
'provided that his name were Rand Smith, or his father were Ron Paul the car salesman rather than Ron Paul the almost-libertarian presidential candidate',
for ant in range(len(antecedents)):
replace_tag = 'ante' #the replacement word
ant_to_string = ' '.join([str(elem) for elem in antecedents]) #convert to string
get_words = ant_to_string.split(" ") #split string
phrase_tag.append(list((replace_tag,) * len(get_words)))#multiply string for each word in the instance
df = pd.DataFrame(phrase_tag, columns=['labels'])#fill in dataframe
and instead of a data frame of 3550 rows, i get a dataframe of 49000ish rows
bound method NDFrame.sample of labels
0 ante
1 ante
2 ante
3 ante
4 ante
... ...
49583 [ante, ante, ante, ante, ante, ante, ante, ant...
49584 [ante, ante, ante, ante, ante, ante, ante, ant...
49585 [ante, ante, ante, ante, ante, ante, ante, ant...
49586 [ante, ante, ante, ante, ante, ante, ante, ant...
49587 [ante, ante, ante, ante, ante, ante, ante, ant...
what am i doing wrong?
Upvotes: 0
Views: 49
Reputation: 3010
Assuming antecedents
is a column in a dataframe, then you would do as follows.
replace_tag = 'ante'
newcol = antecedents.apply(lambda x: [replace_tag] * len(x.split()))
Example
df = pd.DataFrame({'antecedents': ['I love ice cream', 'I hate ice cream more']})
replace_tag = 'ante'
df['antecedents'].apply(lambda x: [replace_tag] * len(x.split()))
=== Output: ===
0 [ante, ante, ante, ante]
1 [ante, ante, ante, ante, ante]
Name: antecedents, dtype: object
Upvotes: 1