Reputation: 23099
Unable to find a solution for this so apologies if this seems simple.
I have a df as follows :
ID, Week
5, 1
6, 1
7, 1
I have a list of reason codes as follows
['Work', 'Holiday', 'Sick', 'Jury']
What I'm trying to do is add this to my current dataframe but multiply this by each unique ID
so i would have something like (i'll only use 1 unique ID for brevity).,
ID, Week, Reason
5, 1, 'Work'
5, 1, 'Holiday'
5, 1, 'Sick',
5, 1, 'Jury'
I have tried all manner of concats, with varying axis's but I'm none-the-wiser how to approach this.
any help would be greatly appreciated.
Upvotes: 1
Views: 80
Reputation: 25239
I think you just need to create a DataFrame from
df_codes = pd.DataFrame({'codes': ['Work', 'Holiday', 'Sick', 'Jury']})
After that you just need to do cross join DataFrames without common columns as in this thread join-two-dataframes-with-no-common-columns-for-calculations
base on that thread, after create df_codes
, you just need this command:
df_new = df.assign(a=1).merge(df_codes.assign(a=1)).drop('a', 1)
Output:
Out[148]:
ID Week codes
0 5 1 Work
1 5 1 Holiday
2 5 1 Sick
3 5 1 Jury
4 6 1 Work
5 6 1 Holiday
6 6 1 Sick
7 6 1 Jury
8 7 1 Work
9 7 1 Holiday
10 7 1 Sick
11 7 1 Jury
Upvotes: 1
Reputation: 323226
Let us do unnesting
df['Reason']=[l]*len(df)
unnesting(df,['Reason'])
Out[1014]:
Reason ID Week
0 Work 5 1
0 Holiday 5 1
0 Sick 5 1
0 Jury 5 1
1 Work 6 1
1 Holiday 6 1
1 Sick 6 1
1 Jury 6 1
2 Work 7 1
2 Holiday 7 1
2 Sick 7 1
2 Jury 7 1
# attached self-define function
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
Or we do cross join
df.merge(pd.DataFrame({'Reason':l}).assign(Week=1))
Out[1020]:
ID Week Reason
0 5 1 Work
1 5 1 Holiday
2 5 1 Sick
3 5 1 Jury
4 6 1 Work
5 6 1 Holiday
6 6 1 Sick
7 6 1 Jury
8 7 1 Work
9 7 1 Holiday
10 7 1 Sick
11 7 1 Jury
Upvotes: 2
Reputation: 28243
let codes
be the list of reasons
codes = ['Work', 'Holiday', 'Sick', 'Jury']
Assign the same codes to all the rows of the data frame. The syntax on the RHS just dupliates the list N times, where N is the length of the data frame
df['codes'] = [codes] * len(df)
Then stack the codes
column, i.e. change from horizontal to vertical format. In this step it is also necessary to set the ID
& Week
columns as indexes & then reset them after the stack operation is done.
df.set_index(['ID','Week']).codes.apply(pd.Series).stack().reset_index(['ID', 'Week'])
Outputs:
ID Week 0
0 5 1 Work
1 5 1 Holiday
2 5 1 Sick
3 5 1 Jury
0 6 1 Work
1 6 1 Holiday
2 6 1 Sick
3 6 1 Jury
0 7 1 Work
1 7 1 Holiday
2 7 1 Sick
3 7 1 Jury
The only thing left is to assign an appropriate name to the newly created column, and if your code depends on the index value, fix it appropriately. In this version, the index values are carried over from the original dataframe & repeated four times
Upvotes: 1
Reputation: 75080
You can use itertools
for this:
import itertools
df_new=(pd.DataFrame(list(itertools.product(df.ID,df.Week,reason_codes)),
columns=df.columns.tolist()+['Reason']).drop_duplicates().reset_index(drop=True))
print(df_new)
ID Week Reason
0 5 1 Work
1 5 1 Holiday
2 5 1 Sick
3 5 1 Jury
4 6 1 Work
5 6 1 Holiday
6 6 1 Sick
7 6 1 Jury
8 7 1 Work
9 7 1 Holiday
10 7 1 Sick
11 7 1 Jury
Upvotes: 2