thor
thor

Reputation: 281

Pandas dealing with other strings not included in mapping

I have code from SO that looks like this (see answer to my question): How to parse different string date formats?

The code works perfectly but I have had to add new data to the table that I did not anticipate (see index 10-16):

input data:

    period             signed
0   Q2 '20 Base       01/01/20
1   Q3 '20 Base       01/01/20
2   Q1 '21 Base       01/01/20
3   February '20 Base 01/01/20
4   March '20 Peak    01/01/20
5   Summer 22 Base    01/01/20
6   Winter 20 Peak    01/01/20
7   Summer 21 Base    02/01/20
8   Year 2021         03/01/20
9   October '21 Peak  04/01/20 
10  12/03/20 base     05/01/20
11  Week 8 '20        06/01/20
12  Weekend base      07/01/20
13  Monday base       08/01/20
14  BOM base          09/01/20
15  Year 2020         10/01/20
16  12-14 April '20   11/01/20

For everything in my datemap I would like to return. But for all other string (index 10-16) not included in the mapping I want to return the date in Column 'signed' to the 4 new column: 1) day 2) month 3) quarter 4) year.

This is the code so far:

datemap = { 'January' :  {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'February' : {'day' : 1, 'month' : 2, 'quarter' : 1}, 
            'March' :    {'day' : 1, 'month' : 3, 'quarter' : 1}, 
            # and so on ...
            'Spring' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'Summer' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
            'Fall' :   {'day' : 1, 'month' : 7, 'quarter' : 3}, 
            'Winter' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
            'Q1' : {'day' : 1, 'month' : 1, 'quarter' : 1}, 
            'Q2' : {'day' : 1, 'month' : 4, 'quarter' : 2}, 
            'Q3' : {'day' : 1, 'month' : 7, 'quarter' : 3}, 
            'Q4' : {'day' : 1, 'month' : 10, 'quarter' : 4}, 
            'Year' : {'day' : 1, 'month' : 1, 'quarter' : 1} }

df['day'] = df.apply (lambda r: datemap[r['period'].split()[0]]['day'], axis=1)
df['month'] = df.apply (lambda r: datemap[r['period'].split()[0]]['month'], axis=1)
df['quarter'] = df.apply (lambda r: datemap[r['period'].split()[0]]['quarter'], axis=1)
df['year'] = df.apply (lambda r: "20" + r['period'].split()[1][-2:], axis=1)

output data

                     day  month quarter year
0   Q2 '20 Base         01  04    2       2020
1   Q3 '20 Peak         01  07    3       2020
2   Q1 '21 Base         01  01    1       2021
3   February '20 Base   01  02    1       2020
4   March '20 Peak      01  03    1       2020
5   Summer 22 Base      01  04    2       2022
6   Winter 20 Peak      01  10    4       2020
7   Summer 21 Base      01  04    2       2021
8   Year 2021           01  01    1       2021
9   October '21 Base    01  10    4       2021
10  12/03/20 base       05  01    1       2020
11  Week 8 '20          06  01    1       2020
12  Weekend base        07  01    1       2020
13  Monday base         08  01    1       2020
14  BOM base            09  01    1       2020
15  Year 2020           10  01    1       2020
16  12-14 April '20     11  01    1       2020 

Upvotes: 0

Views: 52

Answers (1)

Boendal
Boendal

Reputation: 2526

You can do it the following way. It's not a nice one liner anymore but it works:

import calendar

def get_datemap_data(row,key,key_datemap):
    try:
        if key_datemap == "year":
            if key in datemap:
                return row['period'].split()[1][-2:]
            else:
                raise ValueError
        else:
            return datemap[key][key_datemap]
    except KeyError:
        signed_split = row["signed"].split("/")
        map_to_signed = {"day":0,"month":1}
        if key_datemap == "quarter":
            return datemap[calendar.month_name[int(signed_split[1])]]["quarter"]
        return int(signed_split[map_to_signed[key_datemap]])
    except ValueError:
        signed_split = row["signed"].split("/")
        return signed_split[2]


df['day'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'day'), axis=1)
df['month'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'month'), axis=1)
df['quarter'] = df.apply (lambda r: get_datemap_data(r,r['period'].split()[0],'quarter'), axis=1)
df['year'] = df.apply (lambda r: "20" + get_datemap_data(r,r['period'].split()[0],'year'), axis=1)

Output:

               period    signed  day  month  quarter  year
0         Q2 '20 Base  01/01/20    1      4        2  2020
1         Q3 '20 Base  01/01/20    1      7        3  2020
2         Q1 '21 Base  01/01/20    1      1        1  2021
3   February '20 Base  01/01/20    1      2        1  2020
4      March '20 Peak  01/01/20    1      3        1  2020
5      Summer 22 Base  01/01/20    1      4        2  2022
6      Winter 20 Peak  01/01/20    1     10        4  2020
7      Summer 21 Base  02/01/20    1      4        2  2021
8           Year 2021  03/01/20    1      1        1  2021
9    October '21 Peak  04/01/20    1     10        4  2021
10      12/03/20 base  05/01/20    5      1        1  2020
11         Week 8 '20  06/01/20    6      1        1  2020
12       Weekend base  07/01/20    7      1        1  2020
13        Monday base  08/01/20    8      1        1  2020
14           BOM base  09/01/20    9      1        1  2020
15          Year 2021  10/01/20    1      1        1  2021
16    12-14 April '20  11/01/20   11      1        1  2020

Some Remark

  1. In case you want all months and day with a leading zero you have to convert it into a string and add a zero.

  2. In your desired Output there are some errors:

    2.a. Index 0: That should be Quarter 2 not 1

    2.b. Index 8 and 15: Both are the same but you wish different outputs? Impossible. I took the output at Index 8. If you want it based on signed delete the entry Year from your datemap.

Upvotes: 1

Related Questions