Reputation: 33
I am writing a function that returns a dictionary with the year of the docs as key and, as value, it specifies a tuple that is returned by def do_get_citations_per_year function.
This function processes the df:
def do_process_citation_data(f_path):
global my_ocan
my_ocan = pd.read_csv(f_path, names=['oci', 'citing', 'cited', 'creation', 'timespan', 'journal_sc', 'author_sc'],
parse_dates=['creation', 'timespan'])
my_ocan = my_ocan.iloc[1:] # to remove the first row
my_ocan['creation'] = pd.to_datetime(my_ocan['creation'], format="%Y-%m-%d", yearfirst=True)
my_ocan['timespan'] = my_ocan['timespan'].map(parse_timespan)
#print(my_ocan.info())
print(my_ocan['timespan'])
return my_ocan
Then I have this function, when running it it does not trigger any error:
result = tuple()
my_ocan['creation'] = pd.DatetimeIndex(my_ocan['creation']).year
len_citations = len(my_ocan.loc[my_ocan["creation"] == year, "creation"])
timespan = round(my_ocan.loc[my_ocan["creation"] == year, "timespan"].mean())
result = (len_citations, timespan)
print(result)
return result
When I run that function inside of another function:
def do_get_citations_all_years(data):
mydict = {}
s = set(my_ocan.creation)
for year in s:
mydict[year] = do_get_citations_per_year(data, year)
return mydict
I get the error:
File "/Users/lisa/Desktop/yopy/execution_example.py", line 28, in <module>
print(my_ocan.get_citations_all_years())
File "/Users/lisa/Desktop/yopy/ocan.py", line 35, in get_citations_all_years
return do_get_citations_all_years(self.data)
File "/Users/lisa/Desktop/yopy/lisa.py", line 112, in do_get_citations_all_years
mydict[year] = do_get_citations_per_year(data, year)
File "/Users/lisa/Desktop/yopy/lisa.py", line 99, in do_get_citations_per_year
timespan = round(my_ocan.loc[my_ocan["creation"] == year, "timespan"].mean())
ValueError: cannot convert float NaN to integer
What can I do to solve the issue?
Thank you in advance
Upvotes: 0
Views: 1264
Reputation: 627
@Ha Bom, filling with zeros will change the mean, I guess the solution would be to drop rows with NaN instead :
timespan = my_ocan.loc[my_ocan["creation"] == year, "timespan"].dropna().mean()
If you do not want to drop any rows than you will want to fillna with mean for example see this Stackoverflow question for an example
Edit @Ha Bom solution was good seing that the point was to replace the mean by zero
Upvotes: 0
Reputation: 2917
This error means that my_ocan.loc[my_ocan["creation"] == year, "timespan"].mean()
is NaN
.
You should fill NaN
values with 0
before calculating mean because it will not change the mean. Here is an example:
timespan = my_ocan.loc[my_ocan["creation"] == year, "timespan"].fillna(0).mean()
Upvotes: 1