SAPNONEXPERT
SAPNONEXPERT

Reputation: 59

list object has no attribute 'apply' even though it is no list

df2 = pd.DataFrame(pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, engine='python', names=['I', 'II', 'III']))
df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df2["III"].astype(str)

df2 = df2['COMBINED'].replace({'\$': '', ',': ''}, regex=True).str.lower()
df2 = [ nltk.word_tokenize( str(COMBINED) ) for COMBINED in df2 ]

a = df2.apply(set)

AttributeError: 'list' object has no attribute 'apply'

df2 = pd.DataFrame(pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, names=['I', 'II', 'III']))
df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df1["III"].astype(str)
df2["COMBINED"] = df1["COMBINED"].str.replace(r'[^\w\s]+', '')
df2 = df2.COMBINED.apply(nltk.word_tokenize)
df2 = df2.apply(lambda x: [item.lower() for item in x if item.lower() not in stop_words])
a = df2.apply(set)

AttributeError: 'Series' object has no attribute 'intersection'

Anyone an idea how to get around those issues? I want to generate a dot product between two dataframes with strings, i.e., each row with each row of the other df.

Upvotes: 0

Views: 2116

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

You don't need pd.DataFrame() after pd.read_csv(), the return type of pd.read_csv() is already a dataframe.

df2 = df2['COMBINED'].replace({'\$': '', ',': ''}, regex=True).str.lower()
# ^
# |
# Here df2 is a Series, from the original COMBINED column


df2 = [ nltk.word_tokenize( str(COMBINED) ) for COMBINED in df2 ]
# ^                                               ^
# |                                               |
# Here df2 is a list                            Each element in df2 Series

a = df2.apply(set)

List definitely doesn't have attribute apply.

df2 = pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, names=['I', 'II', 'III'])
# ^
# |
# df2 has columns I, II, III

df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df1["III"].astype(str)
# ^
# |
# You create a new column combined,
# df2 now has columns I, II, III, COMBINED

df2["COMBINED"] = df1["COMBINED"].str.replace(r'[^\w\s]+', '')
# ^
# |
# Do operations on COMBINED column

df2 = df2.sentence.apply(nltk.word_tokenize)
# ^
# |
# By using df2.sentence you are accessing sentence column,
# there is no sentence column, only columns I, II, III, COMBINED

Upvotes: 2

Related Questions