Reputation: 133
I am kind of new to python and pandas.
I have a rather large dataset (~500.000 rows). The first column contains the date and time in the form of
created_at
0 Sun Jul 26 04:06:58 +0000 2020
1 Sun Jul 26 04:08:22 +0000 2020
2 Sun Jul 26 04:24:10 +0000 2020
3 Sun Jul 26 04:27:10 +0000 2020
As a first step I would like to trim that to only the month and day to get a result like this:
created_at
0 Jul 26
1 Jul 26
2 Jul 26
3 Jul 26
Ideally I would like to have it like this in the end:
created_at
0 07_26
1 07_26
2 07_26
3 07_26
Can anyone help me with some efficient methods to do that? I would really appreciate any help!
Upvotes: 1
Views: 303
Reputation: 17368
In [28]: df = pd.read_csv('a.csv')
In [29]: df['created_at'] = pd.to_datetime(df['created_at'])
In [30]: df
Out[30]:
created_at
0 2020-07-26 04:06:58+00:00
1 2020-07-26 04:08:22+00:00
2 2020-07-26 04:24:10+00:00
3 2020-07-26 04:27:10+00:00
In [31]: df.dtypes
Out[31]:
created_at datetime64[ns, UTC]
dtype: object
In [32]: df.created_at.dt.strftime("%b %d")
Out[32]:
0 Jul 26
1 Jul 26
2 Jul 26
3 Jul 26
Name: created_at, dtype: object
In [33]: df.created_at.dt.strftime("%m_%d")
Out[33]:
0 07_26
1 07_26
2 07_26
3 07_26
Name: created_at, dtype: object
OR
You can use the parameter parse_dates
during loading the csv file itself
df = pd.read_csv('a.csv', parse_dates=['created_at'])
Upvotes: 1
Reputation: 862791
Use parse_dates
with column name in read_csv
and then for custom format is used Series.dt.strftime
:
df = pd.read_csv('file', parse_dates=['created_at'])
#for first
df['created_at'] = df['created_at'].dt.strftime("%b %d")
#for second
df['created_at'] = df['created_at'].dt.strftime('%m_%d')
print (df)
created_at
0 07_26
1 07_26
2 07_26
3 07_26
Upvotes: 3