kay fresh
kay fresh

Reputation: 121

List does not transfer to Dataframe correctly

I want to check the length of the sentence in a long column of a dataframe and return another dataframe of len(sentence)* a replacement word. This is a sample of the sentences whose lengths i want to check.

"But for his attorney's incompetence",
 'should there have been more supervision from the parents while their children were in the kitchen',
 "If I didn't have the option of makeup (concealer)",
 'if she had foregone insurance and printed off a savings coupon from the website GoodRx',
 'provided that his name were Rand Smith, or his father were Ron Paul the car salesman rather than Ron Paul the almost-libertarian presidential candidate',


for ant in range(len(antecedents)):
    replace_tag = 'ante'   #the replacement word
    ant_to_string = ' '.join([str(elem) for elem in antecedents])  #convert to string
    get_words = ant_to_string.split(" ")  #split string
    phrase_tag.append(list((replace_tag,) * len(get_words)))#multiply string for each word in the instance
df = pd.DataFrame(phrase_tag, columns=['labels'])#fill in dataframe

and instead of a data frame of 3550 rows, i get a dataframe of 49000ish rows

bound method NDFrame.sample of                                                   labels
0                                                   ante
1                                                   ante
2                                                   ante
3                                                   ante
4                                                   ante
...                                                  ...
49583  [ante, ante, ante, ante, ante, ante, ante, ant...
49584  [ante, ante, ante, ante, ante, ante, ante, ant...
49585  [ante, ante, ante, ante, ante, ante, ante, ant...
49586  [ante, ante, ante, ante, ante, ante, ante, ant...
49587  [ante, ante, ante, ante, ante, ante, ante, ant...

what am i doing wrong?

Upvotes: 0

Views: 49

Answers (1)

Eric Truett
Eric Truett

Reputation: 3010

Assuming antecedents is a column in a dataframe, then you would do as follows.

replace_tag = 'ante'
newcol = antecedents.apply(lambda x: [replace_tag] * len(x.split()))

Example

df = pd.DataFrame({'antecedents': ['I love ice cream', 'I hate ice cream more']})
replace_tag = 'ante'
df['antecedents'].apply(lambda x: [replace_tag] * len(x.split()))

=== Output: ===

0          [ante, ante, ante, ante]
1    [ante, ante, ante, ante, ante]
Name: antecedents, dtype: object

Upvotes: 1

Related Questions