Reputation: 551
Given a dataframe (https://pastebin.com/MdqWz4Ke)
# some data
data3 = [["Alex","Tampa","A23","1","Ax","Red"],
["Alex","Tampa","A23","1","Ay","Blue"],
["Alex","Tampa","B43","1","Bx","Green"],
["Alex","Tampa","B43","1","By","White"],
["Alex","Tampa","C55","1","Cx","Red"],
["Alex","Tampa","C55","1","Cy","White"],
["Alex","Tampa","C55","2","Cx","Purple"],
["Alex","Tampa","C55","2","Cy","Black"],
["Tim","San Diego","A23","1","Ax","Green"],
["Tim","San Diego","A23","1","Ay","Black"],
["Tim","San Diego","B43","1","Bx","Yellow"],
["Tim","San Diego","B43","1","By","Black"],
["Tim","San Diego","C55","1","Cx","Pink"],
["Tim","San Diego","C55","1","Cy","Orange"],
["Tim","San Diego","A23","2","Ax","Green"],
["Tim","San Diego","A23","2","Ay","Red"],
["Tim","San Diego","B43","2","Bx",""],
["Tim","San Diego","B43","2","By",""],
["Mark","Houston","A23","1","Ax","Purple"],
["Mark","Houston","A23","1","Ay","Yellow"],
["Mark","Houston","B43","1","Bx","Gray"],
["Mark","Houston","B43","1","By","White"],
["Mark","Houston","C55","1","Cx",""],
["Mark","Houston","C55","1","Cy",""],
["Anthony","Seattle","A23","","Ax","Orange"],
["Anthony","Seattle","A23","","Ay","Black"],
["Anthony","Seattle","B43","","Bx","Red"],
["Anthony","Seattle","B43","","By","Black"],
["Anthony","Seattle","C55","","Cx","Blue"],
["Anthony","Seattle","C55","","Cy","Pink"]]
# create dataframe
df3 = pd.DataFrame(data3,columns=[
"Name","City","Domain","Sequence","Group","Value"])
How can I compare values in groups and conditionally populate columns with those values?
# add Compared columns
df3["Compared Group"] = ""
df3["Compared Value"] = ""
# replace nulls with np.NaN
df3.replace(r"^s*$", np.nan, regex=True, inplace = True)
# fillna for missing Sequence and Value
df3.fillna({"Sequence":"N/A","Value":"NULL"},inplace=True)
# expected result
result = [["Alex","Tampa","A23","1","Ax","Red","Ay","Blue"],
["Alex","Tampa","B43","1","Bx","Green","By","White"],
["Alex","Tampa","C55","1","Cx","Red","Cy","White"],
["Alex","Tampa","C55","2","Cx","Purple","Cy","Black"],
["Tim","San Diego","A23","1","Ax","Green","Ay","Black"],
["Tim","San Diego","A23","2","Ax","Green","Ay","Red"],
["Tim","San Diego","B43","1","Bx","Yellow","By","Black"],
["Tim","San Diego","B43","2","Bx","NULL","By","NULL"],
["Tim","San Diego","C55","1","Cx","Pink","Cy","Orange"],
["Mark","Houston","A23","1","Ax","Purple","Ay","Yellow"],
["Mark","Houston","B43","1","Bx","Gray","By","White"],
["Mark","Houston","C55","1","Cx","NULL","Cy","NULL"],
["Anthony","Seattle","A23","","Ax","Orange","Ay","Black"],
["Anthony","Seattle","B43","","Bx","Red","By","Black"],
["Anthony","Seattle","C55","","Cx","Blue","Cy","Pink"]]
result_df = pd.DataFrame(result,columns=[
"Name","City","Domain","Sequence","Group",
"Value","Compared Group","Compared Value"])
Notes:
If a person has a Group
value that matches another (Ax
to Ay
,
Bx
to By
, e.g.) and the Sequence
number is the same, populate
the Compared Group
and Compared Value
columns with the respective
Group
and Value
.
The City
and Domain
are not considered in the comparison, but all
columns need to remain.
Some rows will not have a Sequence
number, so I filled nulls with
N/A
, in order to group on some value. Also, some rows won't have a
value in the Value
column, so I filled these with NULL
, to fill
something when populating the Compared Values
column.
I've created a dictionary mapping the Group
values
# map groups with dictionary
group_dict = {"Ax":"Ay","Bx":"By","Cx":"Cy"}
and created the groupby
object
# groupby
grouped = df3.groupby(["Name","Sequence","Domain","Group"], group_keys=False)
My original plan was to .loc
in order to populate the Compared
columns and maybe using map
with the dictionary, but when trying to access the values within the groups...
for name in df3["Name"]:
print(grouped.get_group((name,"Ax")))
I get the following error:
ValueError: must supply a a same-length tuple to get_group with multiple grouping keys
Which I assume is because not all groups will contain the same number and types of Group
values (for example, Tim hasSequence
1 and 2 for Ax
while Alex has only Sequence
1 for Ax
). I'm not sure how to proceed from here in order to merge and transform these rows.
Upvotes: 0
Views: 852
Reputation: 25239
Base on your sample, you may try set_index
by the customized groupid of Group
and unstack
. Next, pretty-up column names and reset_index
back
s = df3.groupby(["Name","Sequence","Domain",]).Group.cumcount()
df_out = (df3.set_index(["Name", "City", "Sequence","Domain", s])
.unstack()
.sort_index(level=1, axis=1))
df_out.columns = (df_out.columns.set_levels(['', 'Compared '], level=1)
.map('{0[1]}{0[0]}'.format))
df_out.reset_index()
Out[297]:
Name City Sequence Domain Group Value Compared Group \
0 Alex Tampa 1 A23 Ax Red Ay
1 Alex Tampa 1 B43 Bx Green By
2 Alex Tampa 1 C55 Cx Red Cy
3 Alex Tampa 2 C55 Cx Purple Cy
4 Anthony Seattle N/A A23 Ax Orange Ay
5 Anthony Seattle N/A B43 Bx Red By
6 Anthony Seattle N/A C55 Cx Blue Cy
7 Mark Houston 1 A23 Ax Purple Ay
8 Mark Houston 1 B43 Bx Gray By
9 Mark Houston 1 C55 Cx NULL Cy
10 Tim Los Angeles 1 A23 Ax Green Ay
11 Tim Los Angeles 1 B43 Bx Yellow By
12 Tim Los Angeles 1 C55 Cx Pink Cy
13 Tim Los Angeles 2 A23 Ax Green Ay
14 Tim Los Angeles 2 B43 Bx NULL By
Compared Value
0 Blue
1 White
2 White
3 Black
4 Black
5 Black
6 Pink
7 Yellow
8 White
9 NULL
10 Black
11 Black
12 Orange
13 Red
14 NULL
Upvotes: 0
Reputation: 150735
Given your sample data, you could do:
def myfunc(x):
# extract rows 0 2 4 ...
# reset_index rename the rows as 0 1 2 ...
df1 = x.iloc[::2].reset_index(drop=True)
# extract rows 1 3 5
df2 = x.iloc[1::2].reset_index(drop=True)
# merge put the two dataframes next together
return df1.merge(df2, left_index=True, right_index=True)
# group by the other columns and select only ['Group', 'Value']
(df3.groupby(['Name', 'City', 'Domain', 'Sequence'])[['Group','Value']]
.apply(myfunc) # concatenate the rows
.reset_index(-1, drop=True) # drop the unnecessary index
.reset_index() # make the other original columns as data instead of index
)
Output:
Name City Domain Sequence Group_x Value_x Group_y Value_y
0 Alex Tampa A23 1 Ax Red Ay Blue
1 Alex Tampa B43 1 Bx Green By White
2 Alex Tampa C55 1 Cx Red Cy White
3 Alex Tampa C55 2 Cx Purple Cy Black
4 Anthony Seattle A23 N/A Ax Orange Ay Black
5 Anthony Seattle B43 N/A Bx Red By Black
6 Anthony Seattle C55 N/A Cx Blue Cy Pink
7 Mark Houston A23 1 Ax Purple Ay Yellow
8 Mark Houston B43 1 Bx Gray By White
9 Mark Houston C55 1 Cx NULL Cy NULL
10 Tim Los Angeles A23 1 Ax Green Ay Black
11 Tim Los Angeles A23 2 Ax Green Ay Red
12 Tim Los Angeles B43 1 Bx Yellow By Black
13 Tim Los Angeles B43 2 Bx NULL By NULL
14 Tim Los Angeles C55 1 Cx Pink Cy Orange
Upvotes: 1