Reputation: 12200
I have a DataFrame
like this
DATE
0 2021-06-15 14:22:02.424126
1 2021-06-15 14:22:02.424126
I want new columns with the extracted YEAR
and MONTH
.
df['YEAR'] = df.DATE.apply(lambda x: x.year)
df['MONTH'] = df.DATE.apply(lambda x: x.month)
This gives me the expected result
DATE YEAR MONTH
0 2021-06-15 14:23:16.315543 2021 6
1 2021-06-15 14:23:16.315543 2021 6
But I want to do this in one line and one "iteration". I assume this would be faster.
# what I wish
df[['A', 'B']] = df.DATE.apply(lambda x: (x.year, x.month))
But the result confuses me.
DATE YEAR MONTH A B
0 2021-06-15 14:23:16.315543 2021 6 2021 2021
1 2021-06-15 14:23:16.315543 2021 6 6 6
And I am not able to add a axis=1
to apply()
.
That is my full code
import pandas as pd
# That is the initial DataFrame
df = pd.DataFrame(data = {'DATE': [pd.Timestamp.now(),
pd.Timestamp.now()]})
# works fine
df['YEAR'] = df.DATE.apply(lambda x: x.year)
df['MONTH'] = df.DATE.apply(lambda x: x.month)
# what I wish
df[['A', 'B']] = df.DATE.apply(lambda x: (x.year, x.month))
print(df)
Upvotes: 0
Views: 961
Reputation: 863651
Faster is use native pandas functions, which are vectorized, not apply
solutions (because apply are loops under the hood) with accessor .dt
for apply function for Series/ column
:
df['YEAR'] = df.DATE.dt.year
df['MONTH'] = df.DATE.dt.month
But it is possible by converting to list:
df[['A', 'B']] = df.DATE.apply(lambda x: (x.year, x.month)).tolist()
# That is the initial DataFrame with 10k rows
df = pd.DataFrame(data = {'DATE': [pd.Timestamp.now()] * 10000})
In [168]: %%timeit
...: df['YEAR'] = df.DATE.apply(lambda x: x.year)
...: df['MONTH'] = df.DATE.apply(lambda x: x.month)
...:
...:
70.4 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [169]: %%timeit
...: df[['A', 'B']] = df.DATE.apply(lambda x: (x.year, x.month)).tolist()
...:
...:
64.5 ms ± 7.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [170]: %%timeit
...: df['C'] = df.DATE.dt.year
...: df['D'] = df.DATE.dt.month
...:
...:
2.34 ms ± 17.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1