Kai
Kai

Reputation: 357

Define date format in Python / Pandas

I have two .csv files joined in Python with the Pandas module. One column is date with the format "dd.mm.yyyy".

Now I would like to extract only the month (as 2 digit integer with leading zero) from it for further use.

I have so far accomplished the job but I had to cheat. Python thinks the string that I am getting is the DAY. I don't like half-baked things, so I wanted to ask the community how I can tell Python specifically which part of the date is the month so it can be returned to me?

Here is what I have so far:

import pandas

def saison(input):
    if input == "04" or input == "05" or input == "06" or input == "07" or input == "08" or input == "09":
        return "Sommer"
    else:
        return "Winter"

df_1 = pandas.read_csv("a.csv", sep=';', names=["DWD_ID", "Datum"], header=0)
df_2 = pandas.read_csv("b.csv", sep=';', names=[VEG", "DWD_ID"], header=0)
df_joined = pandas.merge(df_1, df_2, on="DWD_ID")
df_joined["Datum"] = pandas.to_datetime(df_joined["Datum"])

df_joined["Saison"] = saison(df_joined["Datum"].apply(lambda x: x.strftime('%d')))

If I use

x.strftime('%m') 

instead it returns me the day.

Upvotes: 0

Views: 3357

Answers (2)

jezrael
jezrael

Reputation: 862396

First it seems you have swap month and day in datetime, so you need add argument format='%Y-%d-%m' to to_datetime (Python's strftime directives):

df = pd.DataFrame({'Date': ['2016-24-02']})  
print (df)
         Date
0  2016-24-02

print (pd.to_datetime(df.Date, format='%Y-%d-%m'))
0   2016-02-24
Name: Date, dtype: datetime64[ns]

After converting you can use:

dt.strftime:

print (df.Date.dt.strftime('%m'))

Another solution with extract month, then convert to string and last add zero fill by zfill:

print (df.Date.dt.month.astype(str).str.zfill(2))

Sample:

start = pd.to_datetime('2015-02-24')
rng = pd.date_range(start, periods=10, freq='m')

df = pd.DataFrame({'Date': rng})  
print (df)
        Date
0 2015-02-28
1 2015-03-31
2 2015-04-30
3 2015-05-31
4 2015-06-30
5 2015-07-31
6 2015-08-31
7 2015-09-30
8 2015-10-31
9 2015-11-30


print (df.Date.dt.strftime('%m'))
0    02
1    03
2    04
3    05
4    06
5    07
6    08
7    09
8    10
9    11
Name: Date, dtype: object

print (df.Date.dt.month.astype(str).str.zfill(2))
0    02
1    03
2    04
3    05
4    06
5    07
6    08
7    09
8    10
9    11
Name: Date, dtype: object

Last you can use numpy.where with condition with isin:

saison = ["04","05","06","07","08","09"]
df['Saison'] = np.where(df.Date.dt.strftime('%m').isin(saison), 'Sommer','Winter')
print (df)
        Date  Saison
0 2015-02-28  Winter
1 2015-03-31  Winter
2 2015-04-30  Sommer
3 2015-05-31  Sommer
4 2015-06-30  Sommer
5 2015-07-31  Sommer
6 2015-08-31  Sommer
7 2015-09-30  Sommer
8 2015-10-31  Winter
9 2015-11-30  Winter

Upvotes: 1

Nickil Maveli
Nickil Maveli

Reputation: 29711

You could supply the format you want to keep in the arg of to_datetime:

pd.to_datetime(df['date_col'], format="%d.%m.%Y").dt.month.astype(str).str.zfill(2)

Upvotes: 1

Related Questions