NuValue
NuValue

Reputation: 463

Organizing rows following cartesian product of 3 columns in Pandas

I have 3 lists like in the following reproducible example:

year = [2015, 2016, 2017] 
month = [1, 2] 
ids = ['x', 'y', 'z', 'w'] 

What I would like to create is a very simple task of creating a final dataframe where I had the 3 columns sorting its rows as a permutation or cartesian product of the column values.

Something like:

enter image description here

At the end I would like to add an 'Epoque' column where the reference is: December of 2014 is equal to '1', January of 2015 is equal to '2', February of 2015 is equal to '3', and so on (The sequence will continue with initial reference Dec-2014 = '1' (for the 'Epoque' value))...

The final desired output would have this look:

enter image description here

EDIT:

Question edited thanks to the great feedback of @jezrael. He provided me the line missing to achieve the desired df but only missing the 'Epoque' column.

My code suggested would be the following (lacking the 'Epoque' desired column):

import itertools
s = [ [ 2015, 2016, 2017], [1, 2], ['x', 'y', 'z', 'w'] ]
z = list(itertools.product(*s))
df = pd.DataFrame(z) # Trivial line provided kindly by @jezrael I didn't know.

Any help regarding how to achieve the 'Epoque' column efficiently, I would highly appreciate it. Thanks.

Upvotes: 3

Views: 395

Answers (3)

Statistic Dean
Statistic Dean

Reputation: 5280

One solution is to go through all variables with multiple for loop.

#Set the start date of your epoch (Here november 2014 is epoch 0)
month_0 = 11
year_0 = 2014
year_col = []
month_col = []
id_col = []
epoch_col = []
for j1 in ids:
    for j2 in month:
        for j3 in year:
            year_col.append(j3)
            month_col.append(j2)
            id_col.append(j1)
            epoch = (j3-year_0)*12 +(j2-month_0)
            epoch_col.append(epoch)
df = pd.DataFrame({'year':year_col,'month':month_col,'id':id_col,'epoch':epoch_col})

Upvotes: 1

jpp
jpp

Reputation: 164773

You can use Pandas datetime:

df = pd.DataFrame(z, columns=['year', 'month', 'id'])

base = pd.Timestamp('2014-12-01')
dates = pd.to_datetime(df[['year', 'month']].assign(day=1))

df['epoch'] = dates.dt.to_period('M') - base.to_period('M') + 1

# alternative
df['epoch'] = (dates.dt.year - base.year)*12 + (dates.dt.month - base.month) + 1

print(df)

    year  month id  epoch
0   2015      1  x      2
1   2015      1  y      2
2   2015      1  z      2
3   2015      1  w      2
4   2015      2  x      3
5   2015      2  y      3
...
18  2017      1  z     26
19  2017      1  w     26
20  2017      2  x     27
21  2017      2  y     27
22  2017      2  z     27
23  2017      2  w     27

Upvotes: 1

jezrael
jezrael

Reputation: 863166

Use map by dictionary created by date_range defined with start and end dates:

import itertools
s = [ [ 2015, 2016, 2017], [1, 2], ['x', 'y', 'z', 'w'] ]
z = list(itertools.product(*s))

a = 'Dec-2014'
b = 'Dec-2018'
r = pd.date_range(a, b, freq='MS')
d = dict(zip(r, range(1, len(r) + 1)))

df = pd.DataFrame(z, columns=['year','month','id'])
df['epoch'] = pd.to_datetime(df[['year','month']].assign(day=1)).map(d)

print (df)
    year  month id  epoch
0   2015      1  x      2
1   2015      1  y      2
2   2015      1  z      2
3   2015      1  w      2
4   2015      2  x      3
5   2015      2  y      3
6   2015      2  z      3
7   2015      2  w      3
8   2016      1  x     14
9   2016      1  y     14
10  2016      1  z     14
11  2016      1  w     14
12  2016      2  x     15
13  2016      2  y     15
14  2016      2  z     15
15  2016      2  w     15
16  2017      1  x     26
17  2017      1  y     26
18  2017      1  z     26
19  2017      1  w     26
20  2017      2  x     27
21  2017      2  y     27
22  2017      2  z     27
23  2017      2  w     27

Upvotes: 1

Related Questions