Reputation: 357
I have two .csv files joined in Python with the Pandas module. One column is date with the format "dd.mm.yyyy".
Now I would like to extract only the month (as 2 digit integer with leading zero) from it for further use.
I have so far accomplished the job but I had to cheat. Python thinks the string that I am getting is the DAY. I don't like half-baked things, so I wanted to ask the community how I can tell Python specifically which part of the date is the month so it can be returned to me?
Here is what I have so far:
import pandas
def saison(input):
if input == "04" or input == "05" or input == "06" or input == "07" or input == "08" or input == "09":
return "Sommer"
else:
return "Winter"
df_1 = pandas.read_csv("a.csv", sep=';', names=["DWD_ID", "Datum"], header=0)
df_2 = pandas.read_csv("b.csv", sep=';', names=[VEG", "DWD_ID"], header=0)
df_joined = pandas.merge(df_1, df_2, on="DWD_ID")
df_joined["Datum"] = pandas.to_datetime(df_joined["Datum"])
df_joined["Saison"] = saison(df_joined["Datum"].apply(lambda x: x.strftime('%d')))
If I use
x.strftime('%m')
instead it returns me the day.
Upvotes: 0
Views: 3357
Reputation: 862396
First it seems you have swap month and day in datetime, so you need add argument format='%Y-%d-%m'
to to_datetime
(Python's strftime directives):
df = pd.DataFrame({'Date': ['2016-24-02']})
print (df)
Date
0 2016-24-02
print (pd.to_datetime(df.Date, format='%Y-%d-%m'))
0 2016-02-24
Name: Date, dtype: datetime64[ns]
After converting you can use:
print (df.Date.dt.strftime('%m'))
Another solution with extract month
, then convert to string
and last add zero fill by zfill
:
print (df.Date.dt.month.astype(str).str.zfill(2))
Sample:
start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10, freq='m')
df = pd.DataFrame({'Date': rng})
print (df)
Date
0 2015-02-28
1 2015-03-31
2 2015-04-30
3 2015-05-31
4 2015-06-30
5 2015-07-31
6 2015-08-31
7 2015-09-30
8 2015-10-31
9 2015-11-30
print (df.Date.dt.strftime('%m'))
0 02
1 03
2 04
3 05
4 06
5 07
6 08
7 09
8 10
9 11
Name: Date, dtype: object
print (df.Date.dt.month.astype(str).str.zfill(2))
0 02
1 03
2 04
3 05
4 06
5 07
6 08
7 09
8 10
9 11
Name: Date, dtype: object
Last you can use numpy.where
with condition with isin
:
saison = ["04","05","06","07","08","09"]
df['Saison'] = np.where(df.Date.dt.strftime('%m').isin(saison), 'Sommer','Winter')
print (df)
Date Saison
0 2015-02-28 Winter
1 2015-03-31 Winter
2 2015-04-30 Sommer
3 2015-05-31 Sommer
4 2015-06-30 Sommer
5 2015-07-31 Sommer
6 2015-08-31 Sommer
7 2015-09-30 Sommer
8 2015-10-31 Winter
9 2015-11-30 Winter
Upvotes: 1
Reputation: 29711
You could supply the format you want to keep in the arg of to_datetime
:
pd.to_datetime(df['date_col'], format="%d.%m.%Y").dt.month.astype(str).str.zfill(2)
Upvotes: 1