clines
clines

Reputation: 745

Reassign Pandas DataFrame column values

I have a csv that has incorrect datetime formatting. I've worked out how to convert those values into the format I need, but now I need to reassign all the values in a column to the new, converted values.

For example, I'm hoping that there's something I can put into the following FOR loop that will insert the values back into the dataframe at the correct location:

for i in df[df.columns[1]]:
    t = pd.Timestamp(i)
    short_date = t.date().strftime('%m/%d/%Y').lstrip('0')   
    # Insert back into dataframe?

As always, your help is very much appreciated!

Part of the column in question: Part of the dataframe in question:

Created Date    
2019-02-27 22:55:16    
2019-01-29 22:57:12    
2018-11-29 00:13:31    
2019-01-30 21:35:15
2018-12-20 21:14:45    
2018-11-01 16:20:15    
2019-04-11 16:38:07    
2019-01-24 00:23:17    
2018-12-21 19:30:10    
2018-12-19 22:33:04    
2018-11-07 19:54:19    
2019-05-10 21:15:00

Upvotes: 0

Views: 4092

Answers (3)

clines
clines

Reputation: 745

Thank you all for your help. All of the answers were helpful, but the answer I ended up using was as follows:

import pandas as pd 

df[df.columns[0]] = pd.to_datetime(df[df.columns[0]]).dt.strftime('%m/%d/%Y')

Upvotes: 0

suvayu
suvayu

Reputation: 4664

To reassign a column, no need for a loop. Something like this should work:

df["column"] = new_column

new_column is either a Series of matching length, or something that can be broadcasted1 to that length. You can find more details in the docs.

That said, if pd.Timestamp can already parse your data, there is no need for "formatting". The formatting is not associated with a timestamp instance. You can choose a particular formatting when you are converting to string with something like df["timestamp"].dt.strftime("%m/%d/%Y").

On the other hand, if you want to change the precision of your timestamp, you can do something like this:

df["timestamp"] = df["timestamp"].astype("datetime64[D]")

Here, all time information will be rounded to a resolution of days. The letter between the [ and ] is the resolution. Again, all this and more is discussed in the docs.


1 Broadcasting is a concept from numpy where you can operate between different but compatibly shaped arrays. Again, everything is covered in the docs.

Upvotes: 1

Mike
Mike

Reputation: 858

In the simplest, but most instructive, possible terms:

df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
df
#    x  y
# 0  1  4
# 1  2  5
# 2  3  6

df[:] = df[:].astype(float)
df
#      x    y
# 0  1.0  4.0
# 1  2.0  5.0
# 2  3.0  6.0

Let pandas do the work for you.

Or, for only one column:

df.x = df.x.astype(float)
df
#      x  y
# 0  1.0  4
# 1  2.0  5
# 2  3.0  6

You'll, of course, replace astype(float) with .date().strftime('%m/%d/%Y').lstrip('0').

Upvotes: 1

Related Questions