Reputation: 25366
I have the following code, trying to find the hour of the 'Dates' column in a data frame:
print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
def find_hour(self, input):
return input[11:13].astype(float)
where the print(df['Dates'].head(3))
looks like:
0 2015-05-13 23:53:00
1 2015-05-13 23:53:00
2 2015-05-13 23:33:00
However, I got the following error:
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
NameError: ("global name 'find_hour' is not defined", u'occurred at index 0')
Does anyone know what I missed? Thanks!
Note that if I put the function directly in the lambda line like below, everything works fine:
df['hour'] = df.apply(lambda x: x['Dates'][11:13], axis=1).astype(float)
Upvotes: 3
Views: 20825
Reputation: 210842
what is wrong with old good .dt.hour
?
In [202]: df
Out[202]:
Date
0 2015-05-13 23:53:00
1 2015-05-13 23:53:00
2 2015-05-13 23:33:00
In [217]: df['hour'] = df.Date.dt.hour
In [218]: df
Out[218]:
Date hour
0 2015-05-13 23:53:00 23
1 2015-05-13 23:53:00 23
2 2015-05-13 23:33:00 23
and if your Date
column is of string type you may want to convert it to datetime first:
df.Date = pd.to_datetime(df.Date)
or just:
df['hour'] = int(df.Date.str[11:13])
Upvotes: 7
Reputation: 20336
You are trying to use find_hour
before it has yet been defined. You just need to switch things around:
def find_hour(self, input):
return input[11:13].astype(float)
print(df['Dates'].head(3))
df['hour'] = df.apply(lambda x: find_hour(x['Dates']), axis=1)
Edit: Padraic has pointed out a very important point: find_hour()
is defined as taking two arguments, self
and input
, but you are giving it only one. You should define find_hour()
as def find_hour(input):
except that defining the argument as input
shadows the built-in function. You might consider renaming it to something a little more descriptive.
Upvotes: 10