Reputation: 745
I have a csv that has incorrect datetime formatting. I've worked out how to convert those values into the format I need, but now I need to reassign all the values in a column to the new, converted values.
For example, I'm hoping that there's something I can put into the following FOR loop that will insert the values back into the dataframe at the correct location:
for i in df[df.columns[1]]:
t = pd.Timestamp(i)
short_date = t.date().strftime('%m/%d/%Y').lstrip('0')
# Insert back into dataframe?
As always, your help is very much appreciated!
Part of the column in question: Part of the dataframe in question:
Created Date
2019-02-27 22:55:16
2019-01-29 22:57:12
2018-11-29 00:13:31
2019-01-30 21:35:15
2018-12-20 21:14:45
2018-11-01 16:20:15
2019-04-11 16:38:07
2019-01-24 00:23:17
2018-12-21 19:30:10
2018-12-19 22:33:04
2018-11-07 19:54:19
2019-05-10 21:15:00
Upvotes: 0
Views: 4092
Reputation: 745
Thank you all for your help. All of the answers were helpful, but the answer I ended up using was as follows:
import pandas as pd
df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.strftime('%m/%d/%Y')
Upvotes: 0
Reputation: 4664
To reassign a column, no need for a loop. Something like this should work:
df["column"] = new_column
new_column
is either a Series
of matching length, or something that can be broadcasted1 to that length. You can find more details in the docs.
That said, if pd.Timestamp
can already parse your data, there is no need for "formatting". The formatting is not associated with a timestamp instance. You can choose a particular formatting when you are converting to string with something like df["timestamp"].dt.strftime("%m/%d/%Y")
.
On the other hand, if you want to change the precision of your timestamp, you can do something like this:
df["timestamp"] = df["timestamp"].astype("datetime64[D]")
Here, all time information will be rounded to a resolution of days. The letter between the [
and ]
is the resolution. Again, all this and more is discussed in the docs.
1 Broadcasting is a concept from numpy
where you can operate between different but compatibly shaped arrays. Again, everything is covered in the docs.
Upvotes: 1
Reputation: 858
In the simplest, but most instructive, possible terms:
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
df
# x y
# 0 1 4
# 1 2 5
# 2 3 6
df[:] = df[:].astype(float)
df
# x y
# 0 1.0 4.0
# 1 2.0 5.0
# 2 3.0 6.0
Let pandas
do the work for you.
Or, for only one column:
df.x = df.x.astype(float)
df
# x y
# 0 1.0 4
# 1 2.0 5
# 2 3.0 6
You'll, of course, replace astype(float)
with .date().strftime('%m/%d/%Y').lstrip('0')
.
Upvotes: 1