Orvar Korvar
Orvar Korvar

Reputation: 1005

How to change dates in column?

I have a problem with Pandas, Python. I have several rows with different dates, where the dates are String

"2016-02-28"   ABC123
"2016-02-29"   CDE345
"2016-03-30"   FGH567
"2016-03-31"   XYZ235
...

Here we see that feb has two different days, 28 and 29. I am only interested in the month. Thus, I want these rows to have the same day, like this:

"2016-02-29"   ABC123
"2016-02-29"   CDE345
"2016-03-31"   FGH567
"2016-03-31"   XYZ235
...

It does not really matter which day they get, as long as it is the same day but preferable the last day. I cannot truncate and only keep "2016-02" because I later need the day. I can convert it timestamp if it becomes easier.

df ["DATE"] = pandas.to_datetime (df ["DATE"])

(Another question, why does this line convert the DATE column to Timestamp instead of datetime?? It says to convert to datetime, but instead it becomes Timestamp?)

I have tried to resample, but to no avail. I dont want to do this manually, by cutting and pasting Strings as I have done earlier. There should be a more elegant solution?

Upvotes: 1

Views: 73

Answers (2)

BENY
BENY

Reputation: 323226

Or we using to_period

df.index=df.index.to_period('M').to_timestamp('M')
df
Out[16]: 
                 A
2016-02-29  ABC123
2016-02-29  CDE345
2016-03-31  FGH567
2016-03-31  XYZ235

Upvotes: 0

jezrael
jezrael

Reputation: 862406

Use MonthEnd offset:

df["DATE"] = pd.to_datetime (df["DATE"]) + pd.offsets.MonthEnd(0)
print (df)
        DATE     COL
0 2016-02-29  ABC123
1 2016-02-29  CDE345
2 2016-03-31  FGH567
3 2016-03-31  XYZ235

If really big DataFrame and performance is important:

df['DATE'] = pd.to_datetime(df["DATE"]).values.astype('datetime64[M]') + \
                            np.array([1], dtype='timedelta64[M]') - \
                            np.array([1], dtype='timedelta64[D]')
print (df)
        DATE     COL
0 2016-02-29  ABC123
1 2016-02-29  CDE345
2 2016-03-31  FGH567
3 2016-03-31  XYZ235

(Another question, why does this line convert the DATE column to Timestamp instead of datetime?? It says to convert to datetime, but instead it becomes Timestamp?)

If you see the docs here it says :

TimeStamp is the pandas equivalent of python’s Datetime and is interchangable with it in most cases. It’s the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas.

Upvotes: 2

Related Questions