Reputation: 381
I have a dataframe of groups of 3s like:
group value1 value2 value3
1 A1 A2 A3
1 B1 B2 B3
1 C1 C2 C3
2 D1 D2 D3
2 E1 E2 E3
2 F1 F2 F3
...
I'd like to re-order the cells within each group according to a fixed rule by their 'positions', and repeat the same operation over all groups.
This 'fixed' rule will work like below:
Input:
group value1 value2 value3
1 position1 position2 position3
1 position4 position5 position6
1 position7 position8 position9
Output:
group value1 value2 value3
1 position1 position8 position6
1 position4 position2 position9
1 position7 position5 position3
Eventually the dataframe should look like (if this makes sense):
group value1 value2 value3
1 A1 C2 B3
1 B1 A2 C3
1 C1 B2 A3
2 D1 F2 E3
2 E1 D2 F3
2 F1 E2 D3
...
I know how to re-order them if the dataframe only has one group - basically create a temporary variable to store values, get each cell by .loc, and overwrite each cell with desired values.
However, even if we only have 1 group of 3 rows, this is still an apparently silly and tedious way.
My question is: can we possibly
Upvotes: 1
Views: 94
Reputation: 75080
Here is a proposal which uses numpy indexing with reshaping on each group.
Setup:
Lets assume your original df and the position dataframes are as below:
d = {'group': [1, 1, 1, 2, 2, 2],
'value1': ['A1', 'B1', 'C1', 'D1', 'E1', 'F1'],
'value2': ['A2', 'B2', 'C2', 'D2', 'E2', 'F2'],
'value3': ['A3', 'B3', 'C3', 'D3', 'E3', 'F3']}
out_d = {'group': [1, 1, 1, 2, 2, 2],
'value1': ['position1', 'position4', 'position7',
'position1', 'position4', 'position7'],
'value2': ['position8', 'position2', 'position5',
'position8', 'position2', 'position5'],
'value3': ['position6', 'position9', 'position3',
'position6', 'position9', 'position3']}
df = pd.DataFrame(d)
out = pd.DataFrame(out_d)
print("Original dataframe :\n\n",df,"\n\n Position dataframe :\n\n",out)
Original dataframe :
group value1 value2 value3
0 1 A1 A2 A3
1 1 B1 B2 B3
2 1 C1 C2 C3
3 2 D1 D2 D3
4 2 E1 E2 E3
5 2 F1 F2 F3
Position dataframe :
group value1 value2 value3
0 1 position1 position8 position6
1 1 position4 position2 position9
2 1 position7 position5 position3
3 2 position1 position8 position6
4 2 position4 position2 position9
5 2 position7 position5 position3
Working Solution:
Method 1: : Creating a function and use in df.groupby.apply
#remove letters and extract only position numbers and subtract 1
#since python indexing starts at 0
o = out.applymap(lambda x: int(''.join(re.findall('\d+',x)))-1 if type(x)==str else x)
#Merge this output with original dataframe
df1 = df.merge(o,on='group',left_index=True,right_index=True,suffixes=('','_pos'))
# Build a function which rearranges the df based on the position df:
def fun(x):
c = x.columns.str.contains("_pos")
return pd.DataFrame(np.ravel(x.loc[:,~c])[np.ravel(x.loc[:,c])]
.reshape(x.loc[:,~c].shape),
columns=x.columns[~c])
output = (df1.groupby("group").apply(fun).reset_index("group")
.reset_index(drop=True))
print(output)
group value1 value2 value3
0 1 A1 C2 B3
1 1 B1 A2 C3
2 1 C1 B2 A3
3 2 D1 F2 E3
4 2 E1 D2 F3
5 2 F1 E2 D3
Method 2: Iterate through each group and re-arrange:
o = out.applymap(lambda x: int(''.join(re.findall('\d+',x)))-1 if type(x)==str else x)
df1 = df.merge(o,on='group',left_index=True,right_index=True,
suffixes=('','_pos')).set_index("group")
idx = df1.index.unique()
l = []
for i in idx:
v = df1.loc[i]
c = v.columns.str.contains("_pos")
l.append(np.ravel(v.loc[:,~c])[np.ravel(v.loc[:,c])].reshape(v.loc[:,~c].shape))
final = pd.DataFrame(np.concatenate(l),index=df1.index,
columns=df1.columns[~c]).reset_index()
print(final)
group value1 value2 value3
0 1 A1 C2 B3
1 1 B1 A2 C3
2 1 C1 B2 A3
3 2 D1 F2 E3
4 2 E1 D2 F3
5 2 F1 E2 D3
Upvotes: 2