Reputation: 90
I have this dataframe(df), that looks like
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| Gene | Gene name | Tissue | Cell type | Level | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4 | adipose tissue | adipocytes | Low | Approved |
| ENSG00000001561 | ENPP4 | adrenal gland | glandular cells | High | Approved |
| ENSG00000001561 | ENPP4 | appendix | glandular cells | Medium | Approved |
| ENSG00000001561 | ENPP4 | appendix | lymphoid tissue | Low | Approved |
| ENSG00000001561 | ENPP4 | bone marrow | hematopoietic cells | Medium | Approved |
| ENSG00000002586 | CD99 | adipose tissue | adipocytes | Low | Supported |
| ENSG00000002586 | CD99 | adrenal gland | glandular cells | Medium | Supported |
| ENSG00000002586 | CD99 | appendix | glandular cells | Not detected | Supported |
| ENSG00000002586 | CD99 | appendix | lymphoid tissue | Not detected | Supported |
| ENSG00000002586 | CD99 | bone marrow | hematopoietic cells | High | Supported |
| ENSG00000002586 | CD99 | breast | adipocytes | Not detected | Supported |
| ENSG00000003056 | M6PR | adipose tissue | adipocytes | High | Approved |
| ENSG00000003056 | M6PR | adrenal gland | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | lymphoid tissue | High | Approved |
| ENSG00000003056 | M6PR | bone marrow | hematopoietic cells | High | Approved |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
Expected output:
+-----------+--------+-------------------------------+
| Gene name | Level | Tissue |
+-----------+--------+-------------------------------+
| ENPP4 | Low | adipose tissue, appendix |
| ENPP4 | High | adrenal gland, bronchus |
| ENPP4 | Medium | appendix, breast, bone marrow |
| CD99 | Low | adipose tissue, appendix |
| CD99 | High | bone marrow |
| CD99 | Medium | adrenal gland |
| ... | ... | ... |
+-----------+--------+-------------------------------+
code used (took help from multiple if else conditions in pandas dataframe and derive multiple columns):
def text_df(df):
if (df[df['Level'].str.match('High')]):
return (df.assign(Level='High') + df['Tissue'].astype(str))
elif (df[df['Level'].str.match('Medium')]):
return (df.assign(Level='Medium') + df['Tissue'].astype(str))
elif (df[df['Level'].str.match('Low')]):
return (df.assign(Level='Low') + df['Tissue'].astype(str))
df = df.apply(text_df, axis = 1)
Error: KeyError: ('Level', 'occurred at index 172')
I can't understand what am I doing wrong. any suggestion?
Upvotes: 1
Views: 482
Reputation: 153460
Try:
df.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join)
Output:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
Update added per comments below:
(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
.agg(','.join).set_index(['Gene name','Level'])['Cell type']
.unstack().reset_index())
Output:
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
Upvotes: 3