Reputation: 5914
I am trying to set a new dataFrame based on the user values being set for the date variables below. the raw data date column (Date
) comes into pandas in the following format 7/5/17
. Following what I assume are best practices, I convert the field into datetime
format which produces an array with the yyyy-mm-dd
format, '2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04','2017-12-05',
. From here I am trying to subselect my dataFrame with a date_range
within my start and end date and then only show the columns being selected with variables X
and y
. However, I produce raise KeyError('{mask} not in index'.format(mask=objarr[mask]))
at the subselect line. What value in my code could be throwing that error? Is it due to the datetime
formatting?
# date column and conversion to datetime64[ns]
dateColumn = pd.to_datetime(rawData['Date'])
# date start
dateStart = '12/1/17'
# date end
dateEnd = '2/28/18'
# date range
dateRange = pd.date_range(dateStart, dateEnd)
# dependent variable
y = 'Leads'
# independent variable(s)
X = 'Clicks'
Sub-select x and y columns for Date rows between 12/1/17 and 2/28/18:
print(rawData[rawData[dateColumn].isin(dateRange)][X,y])
Upvotes: 0
Views: 248
Reputation: 23980
You are indexing with the column instead of the column name:
print(rawData[dateColumn.isin(dateRange)][[X,y]])
Upvotes: 1