Reputation: 503
Say, I have this DataFrame:
pd.DataFrame({"a":[["a","aa"],np.nan],"b":[["b","bb"],["bbb"]],"c":[["c","cc"],["c","cc"]],"merge apple":[["d","dd"],["d","dd"]],"merge ball":[["e","ee"],["e","ee"]]})
a b c merge apple merge ball
0 [a, aa] [b, bb] [c, cc] [d, dd] [e, ee]
1 NaN [bbb] [c, cc] [d, dd] [e, ee]
I want to merge the column that contains word "apple" to column a, and the column that contains word "ball" to column b. I want the resulting table to be like this.
a b c
0 [a, aa, d, dd] [b, bb, e, ee] [c, cc]
1 [d, dd] [bbb, e, ee] [c, cc]
I can't do something like df['a'] = df['a'].str.cat(df['merge apple'],sep=" ")
or df["a"] = df["a"] + df["merge apple"]
because I don't have the exact name of the column. Also, I need to append the text in the list.
Upvotes: 0
Views: 48
Reputation: 1206
You can do the below using filter function:
>>> x = pd.DataFrame({'a':[['a','aa'],['a']],'b':[['b','bb'],['bbb']],'c':[['c','cc'],['c','cc']],'merge apple':[['d','dd'],['d','dd']],'merge ball':[['e','ee'],['e','ee']]})
>>> x
a b c merge apple merge ball combined_a
0 [a, aa] [b, bb] [c, cc] [d, dd] [e, ee] [a, aa, d, dd]
1 [a] [bbb] [c, cc] [d, dd] [e, ee] [a, d, dd]
>>> y = list(x.columns.values)
>>> merge_columns_with_a = list(filter(lambda x: 'apple' in x, y))
>>> merge_columns_with_a
['merge apple']
>>> for i in merge_columns_with_a:
... x['a'] += x[i]
>>> x
a b c merge apple merge ball
0 [a, aa, d, dd] [b, bb] [c, cc] [d, dd] [e, ee]
1 [a, d, dd] [bbb] [c, cc] [d, dd] [e, ee]
>>> merge_columns_with_b = list(filter(lambda x: 'ball' in x, y))
>>> merge_columns_with_b
['merge ball']
>>> for i in merge_columns_with_b:
... x['b'] += x[i]
>>> x
a b c merge apple merge ball
0 [a, aa, d, dd] [b, bb, e, ee] [c, cc] [d, dd] [e, ee]
1 [a, d, dd] [bbb, e, ee] [c, cc] [d, dd] [e, ee]
Upvotes: 1
Reputation: 35626
Try stack
+ pivot_table
:
import numpy as np
import pandas as pd
df = pd.DataFrame({
"a": [["a", "aa"], np.nan],
"b": [["b", "bb"], ["bbb"]],
"c": [["c", "cc"], ["c", "cc"]],
"merge apple": [["d", "dd"], ["d", "dd"]],
"merge ball": [["e", "ee"], ["e", "ee"]]
})
# Stack to long format:
df = df.stack(dropna=False).reset_index()
# Convert all columns to first letter
df['level_1'] = df['level_1'].str.split('merge ').str[-1].str[0]
# Pivot with aggfunc to flatten list of lists (nan safe)
df = df.pivot_table(index='level_0',
columns='level_1',
values=0,
aggfunc=lambda s: s.apply(pd.Series).stack().tolist()) \
.rename_axis(None) \
.rename_axis(None, axis=1)
print(df)
df
:
a b c
0 [a, aa, d, dd] [b, bb, e, ee] [c, cc]
1 [d, dd] [bbb, e, ee] [c, cc]
Upvotes: 1