Reputation: 1005
I have a problem with Pandas, Python. I have several rows with different dates, where the dates are String
"2016-02-28" ABC123
"2016-02-29" CDE345
"2016-03-30" FGH567
"2016-03-31" XYZ235
...
Here we see that feb has two different days, 28 and 29. I am only interested in the month. Thus, I want these rows to have the same day, like this:
"2016-02-29" ABC123
"2016-02-29" CDE345
"2016-03-31" FGH567
"2016-03-31" XYZ235
...
It does not really matter which day they get, as long as it is the same day but preferable the last day. I cannot truncate and only keep "2016-02" because I later need the day. I can convert it timestamp if it becomes easier.
df ["DATE"] = pandas.to_datetime (df ["DATE"])
(Another question, why does this line convert the DATE column to Timestamp instead of datetime?? It says to convert to datetime, but instead it becomes Timestamp?)
I have tried to resample, but to no avail. I dont want to do this manually, by cutting and pasting Strings as I have done earlier. There should be a more elegant solution?
Upvotes: 1
Views: 73
Reputation: 323226
Or we using to_period
df.index=df.index.to_period('M').to_timestamp('M')
df
Out[16]:
A
2016-02-29 ABC123
2016-02-29 CDE345
2016-03-31 FGH567
2016-03-31 XYZ235
Upvotes: 0
Reputation: 862406
Use MonthEnd
offset:
df["DATE"] = pd.to_datetime (df["DATE"]) + pd.offsets.MonthEnd(0)
print (df)
DATE COL
0 2016-02-29 ABC123
1 2016-02-29 CDE345
2 2016-03-31 FGH567
3 2016-03-31 XYZ235
If really big DataFrame and performance is important:
df['DATE'] = pd.to_datetime(df["DATE"]).values.astype('datetime64[M]') + \
np.array([1], dtype='timedelta64[M]') - \
np.array([1], dtype='timedelta64[D]')
print (df)
DATE COL
0 2016-02-29 ABC123
1 2016-02-29 CDE345
2 2016-03-31 FGH567
3 2016-03-31 XYZ235
(Another question, why does this line convert the DATE column to Timestamp instead of datetime?? It says to convert to datetime, but instead it becomes Timestamp?)
If you see the docs here it says :
TimeStamp is the pandas equivalent of python’s Datetime and is interchangable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.
Upvotes: 2