Reputation:
Hello I have this pandas dataframe:
Key Predictions
C10D1 1
C11D1 8
C11D2 2
C12D1 2
C12D2 8
C13D1 3
C13D2 9
C14D1 4
C14D2 9
C15D1 8
C15D2 3
C1D1 5
C2D1 7
C3D1 4
C4D1 1
C4D2 9
C5D1 3
C5D2 2
C6D1 1
C6D2 0
C7D1 8
C7D2 6
C8D1 3
C8D2 3
C9D1 5
C9D2 1
I want to concatenate each cells from "Prediction" column where the "Key" matches up to 4 character. For Example... in the "Key" column I have "C11D1" and "C11D2".. as they both contain "C11" i would like to concatente rows from prediction column that has "C11D1" and "C11D2" as index .. Thus the result Should be:
Predictions
Key
C10 1
C11 82
C12 28
and so on
Upvotes: 0
Views: 82
Reputation: 133458
EDIT: Since OP wants to concatenate values of same index so adding that solution here.
df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1'))\
['Predictions'].apply(lambda x: ','.join(map(str,x)))
Above will concatenate them with ,
you could set it to null or space as per your need in lambda x: ','
section.
Could you please try following.
df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()
OR with resetting index try:
df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()\
.reset_index()
Explanation: Adding explanation for above code.
df.groupby(df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1')).sum()
df.groupby: Means use groupby for df whatever values passed to it.
df['Key'].replace(regex=True,to_replace=r'(C[0-9]+).*',value=r'\1'): Means df's key column I am using regex to replace everything after Cdigits with NULL as per OP's question.
.sum(): Means to get total sum of all similar 1st column as per need.
Upvotes: 1