Reputation: 47
df = pd.DataFrame({'a': ['Anakin Ana', 'Anakin Ana, Chris Cannon', 'Chris Cannon', 'Bella Bold'],
'b': ['Bella Bold, Chris Cannon', 'Donald Deakon', 'Bella Bold', 'Bella Bold'],
'c': ['Chris Cannon', 'Chris Cannon, Donald Deakon', 'Chris Cannon', 'Anakin Ana, Bella Bold']},
index=[0, 1, 2])
Hi everyone,
I'm trying to count how many names are in common in each column. Above is an example of what my data looks like. At first, it said 'float' object has no attribute 'split' error. I did some searching and it seems the error is coming from my missing data which is reading as float. But even when I change the column in string variable it keeps getting the error. Below is my code.
import pandas as pd
import csv
filepath = "C:/Users/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')
df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
df['overlap_count'] = df['word_overlap'].str.len()
df.to_csv('creditdata3.csv',mode='a',index=False)
And here is the error
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-21-b85ac8637aae> in <module>
4 df = pd.read_csv(filepath,encoding='utf-8')
5
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
7 df['overlap_count'] = df['word_overlap'].str.len()
8
<ipython-input-21-b85ac8637aae> in <listcomp>(.0)
4 df = pd.read_csv(filepath,encoding='utf-8')
5
----> 6 df['word_overlap'] = [set(x[8].astype(str).split(",")) & set(x[10].astype(str).split(",")) for x in df.values]
7 df['overlap_count'] = df['word_overlap'].str.len()
8
AttributeError: 'float' object has no attribute 'astype'
Upvotes: 1
Views: 2529
Reputation: 47
import pandas as pd
import csv
filepath = "C:/data/Untitled Folder/creditdata2.csv"
df = pd.read_csv(filepath,encoding='utf-8')
def f(columns):
f_desc, f_def = str(columns[6]), str(columns[7])
common = set(f_desc.split(",")).intersection(set(f_def.split(",")))
return common, len(common)
df[['word_overlap', 'word_count']] = df.apply(f, axis=1, raw=True).apply(pd.Series)
df.to_csv('creditdata3.csv',mode='a',index=False)
I found another way to do it thank you, everyone!
Upvotes: 0
Reputation: 5803
astype
is a method in DataFrame
, and here you have just a primitive float
type, because you've already indexed x
.
Try this:
df['word_overlap'] = [set(str(x[8]).split(",")) & set(str(x[10]).split(",")) for x in df.values]
Upvotes: 1