how to concatenate two cells in a pandas column based on some conditions?

Question

Hello I have this pandas dataframe:


Key      Predictions    
C10D1   1
C11D1   8
C11D2   2
C12D1   2
C12D2   8
C13D1   3
C13D2   9
C14D1   4
C14D2   9
C15D1   8
C15D2   3
C1D1    5
C2D1    7
C3D1    4
C4D1    1
C4D2    9
C5D1    3
C5D2    2
C6D1    1
C6D2    0
C7D1    8
C7D2    6
C8D1    3
C8D2    3
C9D1    5
C9D2    1

I want to concatenate each cells from "Prediction" column where the "Key" matches up to 4 character. For Example... in the "Key" column I have "C11D1" and "C11D2".. as they both contain "C11" i would like to concatente rows from prediction column that has "C11D1" and "C11D2" as index .. Thus the result Should be:

       Predictions
Key 
C10     1
C11     82
C12     28
and so on

RavinderSingh13 · Accepted Answer

EDIT: Since OP wants to concatenate values of same index so adding that solution here.

df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1'))\
['Predictions'].apply(lambda x: ','.join(map(str,x)))

Above will concatenate them with , you could set it to null or space as per your need in lambda x: ',' section.

Could you please try following.

df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()

OR with resetting index try:

df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()\
.reset_index()

Explanation: Adding explanation for above code.

df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()

df.groupby: Means use groupby for df whatever values passed to it.
df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1'): Means df's key column I am using regex to replace everything after Cdigits with NULL as per OP's question.

.sum(): Means to get total sum of all similar 1st column as per need.

how to concatenate two cells in a pandas column based on some conditions?

Answers (1)

Related Questions