BlueMango
BlueMango

Reputation: 503

Merge two columns based on the name of columns

Say, I have this DataFrame:

pd.DataFrame({"a":[["a","aa"],np.nan],"b":[["b","bb"],["bbb"]],"c":[["c","cc"],["c","cc"]],"merge apple":[["d","dd"],["d","dd"]],"merge ball":[["e","ee"],["e","ee"]]})


       a       b      c     merge apple  merge ball
0   [a, aa] [b, bb] [c, cc] [d, dd]     [e, ee]
1   NaN      [bbb]  [c, cc] [d, dd]     [e, ee]

I want to merge the column that contains word "apple" to column a, and the column that contains word "ball" to column b. I want the resulting table to be like this.

       a               b              c
0   [a, aa, d, dd]  [b, bb, e, ee]  [c, cc]
1   [d, dd]         [bbb, e, ee]    [c, cc]

I can't do something like df['a'] = df['a'].str.cat(df['merge apple'],sep=" ") or df["a"] = df["a"] + df["merge apple"] because I don't have the exact name of the column. Also, I need to append the text in the list.

Upvotes: 0

Views: 48

Answers (2)

Pratap Alok Raj
Pratap Alok Raj

Reputation: 1206

You can do the below using filter function:

>>> x = pd.DataFrame({'a':[['a','aa'],['a']],'b':[['b','bb'],['bbb']],'c':[['c','cc'],['c','cc']],'merge apple':[['d','dd'],['d','dd']],'merge ball':[['e','ee'],['e','ee']]})

>>> x
         a        b        c merge apple merge ball      combined_a
0  [a, aa]  [b, bb]  [c, cc]     [d, dd]    [e, ee]  [a, aa, d, dd]
1      [a]    [bbb]  [c, cc]     [d, dd]    [e, ee]      [a, d, dd]

>>> y = list(x.columns.values)

>>> merge_columns_with_a = list(filter(lambda x: 'apple' in x, y))

>>> merge_columns_with_a
['merge apple']

>>> for i in merge_columns_with_a:
...     x['a'] += x[i]

>>> x
            a        b        c       merge apple merge ball
0  [a, aa, d, dd]  [b, bb]  [c, cc]     [d, dd]    [e, ee]  
1      [a, d, dd]    [bbb]  [c, cc]     [d, dd]    [e, ee]  

>>> merge_columns_with_b = list(filter(lambda x: 'ball' in x, y))

>>> merge_columns_with_b
['merge ball']

>>> for i in merge_columns_with_b:
...     x['b'] += x[i]

>>> x
            a               b        c       merge apple  merge ball 
0  [a, aa, d, dd]  [b, bb, e, ee]  [c, cc]     [d, dd]    [e, ee]
1      [a, d, dd]    [bbb, e, ee]  [c, cc]     [d, dd]    [e, ee]

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35626

Try stack + pivot_table:

import numpy as np
import pandas as pd

df = pd.DataFrame({
    "a": [["a", "aa"], np.nan],
    "b": [["b", "bb"], ["bbb"]],
    "c": [["c", "cc"], ["c", "cc"]],
    "merge apple": [["d", "dd"], ["d", "dd"]],
    "merge ball": [["e", "ee"], ["e", "ee"]]
})

# Stack to long format:
df = df.stack(dropna=False).reset_index()

# Convert all columns to first letter
df['level_1'] = df['level_1'].str.split('merge ').str[-1].str[0]

# Pivot with aggfunc to flatten list of lists (nan safe)
df = df.pivot_table(index='level_0',
                    columns='level_1',
                    values=0,
                    aggfunc=lambda s: s.apply(pd.Series).stack().tolist()) \
    .rename_axis(None) \
    .rename_axis(None, axis=1)

print(df)

df:

                a               b        c
0  [a, aa, d, dd]  [b, bb, e, ee]  [c, cc]
1         [d, dd]    [bbb, e, ee]  [c, cc]

Upvotes: 1

Related Questions