Reputation: 59
df2 = pd.DataFrame(pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, engine='python', names=['I', 'II', 'III']))
df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df2["III"].astype(str)
df2 = df2['COMBINED'].replace({'\$': '', ',': ''}, regex=True).str.lower()
df2 = [ nltk.word_tokenize( str(COMBINED) ) for COMBINED in df2 ]
a = df2.apply(set)
AttributeError: 'list' object has no attribute 'apply'
df2 = pd.DataFrame(pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, names=['I', 'II', 'III']))
df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df1["III"].astype(str)
df2["COMBINED"] = df1["COMBINED"].str.replace(r'[^\w\s]+', '')
df2 = df2.COMBINED.apply(nltk.word_tokenize)
df2 = df2.apply(lambda x: [item.lower() for item in x if item.lower() not in stop_words])
a = df2.apply(set)
AttributeError: 'Series' object has no attribute 'intersection'
Anyone an idea how to get around those issues? I want to generate a dot product between two dataframes with strings, i.e., each row with each row of the other df.
Upvotes: 0
Views: 2116
Reputation: 30050
You don't need pd.DataFrame()
after pd.read_csv()
, the return type of pd.read_csv()
is already a dataframe.
df2 = df2['COMBINED'].replace({'\$': '', ',': ''}, regex=True).str.lower()
# ^
# |
# Here df2 is a Series, from the original COMBINED column
df2 = [ nltk.word_tokenize( str(COMBINED) ) for COMBINED in df2 ]
# ^ ^
# | |
# Here df2 is a list Each element in df2 Series
a = df2.apply(set)
List definitely doesn't have attribute apply
.
df2 = pd.read_csv("file.csv", delimiter=';', header=None, skiprows=1, names=['I', 'II', 'III'])
# ^
# |
# df2 has columns I, II, III
df2["COMBINED"] = df2["I"].astype(str) + df2["II"].astype(str) + df1["III"].astype(str)
# ^
# |
# You create a new column combined,
# df2 now has columns I, II, III, COMBINED
df2["COMBINED"] = df1["COMBINED"].str.replace(r'[^\w\s]+', '')
# ^
# |
# Do operations on COMBINED column
df2 = df2.sentence.apply(nltk.word_tokenize)
# ^
# |
# By using df2.sentence you are accessing sentence column,
# there is no sentence column, only columns I, II, III, COMBINED
Upvotes: 2