Reputation: 463
I have 3 lists like in the following reproducible example:
year = [2015, 2016, 2017]
month = [1, 2]
ids = ['x', 'y', 'z', 'w']
What I would like to create is a very simple task of creating a final dataframe where I had the 3 columns sorting its rows as a permutation or cartesian product of the column values.
Something like:
At the end I would like to add an 'Epoque' column where the reference is: December of 2014 is equal to '1', January of 2015 is equal to '2', February of 2015 is equal to '3', and so on (The sequence will continue with initial reference Dec-2014 = '1' (for the 'Epoque' value))...
The final desired output would have this look:
EDIT:
Question edited thanks to the great feedback of @jezrael. He provided me the line missing to achieve the desired df but only missing the 'Epoque' column.
My code suggested would be the following (lacking the 'Epoque' desired column):
import itertools
s = [ [ 2015, 2016, 2017], [1, 2], ['x', 'y', 'z', 'w'] ]
z = list(itertools.product(*s))
df = pd.DataFrame(z) # Trivial line provided kindly by @jezrael I didn't know.
Any help regarding how to achieve the 'Epoque' column efficiently, I would highly appreciate it. Thanks.
Upvotes: 3
Views: 395
Reputation: 5280
One solution is to go through all variables with multiple for loop.
#Set the start date of your epoch (Here november 2014 is epoch 0)
month_0 = 11
year_0 = 2014
year_col = []
month_col = []
id_col = []
epoch_col = []
for j1 in ids:
for j2 in month:
for j3 in year:
year_col.append(j3)
month_col.append(j2)
id_col.append(j1)
epoch = (j3-year_0)*12 +(j2-month_0)
epoch_col.append(epoch)
df = pd.DataFrame({'year':year_col,'month':month_col,'id':id_col,'epoch':epoch_col})
Upvotes: 1
Reputation: 164773
You can use Pandas datetime
:
df = pd.DataFrame(z, columns=['year', 'month', 'id'])
base = pd.Timestamp('2014-12-01')
dates = pd.to_datetime(df[['year', 'month']].assign(day=1))
df['epoch'] = dates.dt.to_period('M') - base.to_period('M') + 1
# alternative
df['epoch'] = (dates.dt.year - base.year)*12 + (dates.dt.month - base.month) + 1
print(df)
year month id epoch
0 2015 1 x 2
1 2015 1 y 2
2 2015 1 z 2
3 2015 1 w 2
4 2015 2 x 3
5 2015 2 y 3
...
18 2017 1 z 26
19 2017 1 w 26
20 2017 2 x 27
21 2017 2 y 27
22 2017 2 z 27
23 2017 2 w 27
Upvotes: 1
Reputation: 863166
Use map
by dictionary created by date_range
defined with start and end date
s:
import itertools
s = [ [ 2015, 2016, 2017], [1, 2], ['x', 'y', 'z', 'w'] ]
z = list(itertools.product(*s))
a = 'Dec-2014'
b = 'Dec-2018'
r = pd.date_range(a, b, freq='MS')
d = dict(zip(r, range(1, len(r) + 1)))
df = pd.DataFrame(z, columns=['year','month','id'])
df['epoch'] = pd.to_datetime(df[['year','month']].assign(day=1)).map(d)
print (df)
year month id epoch
0 2015 1 x 2
1 2015 1 y 2
2 2015 1 z 2
3 2015 1 w 2
4 2015 2 x 3
5 2015 2 y 3
6 2015 2 z 3
7 2015 2 w 3
8 2016 1 x 14
9 2016 1 y 14
10 2016 1 z 14
11 2016 1 w 14
12 2016 2 x 15
13 2016 2 y 15
14 2016 2 z 15
15 2016 2 w 15
16 2017 1 x 26
17 2017 1 y 26
18 2017 1 z 26
19 2017 1 w 26
20 2017 2 x 27
21 2017 2 y 27
22 2017 2 z 27
23 2017 2 w 27
Upvotes: 1