Amoroso
Amoroso

Reputation: 1003

How to join strings in pandas column based on a condition

Given a dataframe:

  text   binary
1 apple    1
2 bee      0
3 cider    1
4 honey    0

I would like to get 2 lists: one = [apple cider], zero = [bee honey]

How do I join the strings in the 'text' column based on the group (1 or 0) they belong to in the column 'binary'?

I wrote for loops to check for each row if binary is 1 or 0 then proceeded to append the text in the text column to a list but I was wondering if there's a more efficient way given that in pandas, we could join texts in columns by simply calling ' '.join(df.text). But how can we do it base on a condition?

--Follow up Question --

  binary   text1   text2  text3
0       1   hello    this  table
1       1   cider    that  chair
2       0     bee     how  mouse
3       0  winter  bottle    fan

I would like to do the same thing but with multiple text columns.

raw = defaultdict(list)
raw['text1'] = ['hello','cider','bee','winter']
raw['text2'] = ['this','that','how','bottle']
raw['text3'] = ['table','chair','mouse','fan']
raw['binary'] = [1,1,0,0]

df= pd.DataFrame.from_dict(raw)
text1 = df.groupby('binary').text1.apply(list)
text2 = df.groupby('binary').text2.apply(list)
text3 = df.groupby('binary').text3.apply(list)

How can I write something like:

for i in ['text1','text2','text3']:
        df.groupby('binary').i.apply(list)

Upvotes: 1

Views: 908

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

UPDATE: Follow up Question

one list for each text* column grouped by binary column

In [56]: df.set_index('binary').stack().groupby(level=[0,1]).apply(list).unstack()
Out[56]:
                 text1          text2           text3
binary
0        [bee, winter]  [how, bottle]    [mouse, fan]
1       [hello, cider]   [this, that]  [table, chair]

one list for all text columns grouped by binary column

In [54]: df.set_index('binary').stack().groupby(level=0).apply(list)
Out[54]:
binary
0      [bee, how, mouse, winter, bottle, fan]
1    [hello, this, table, cider, that, chair]
dtype: object

OLD answer:

IIUC you can group by binary and apply list to grouped text column:

In [8]: df.groupby('binary').text.apply(list)
Out[8]:
binary
0      [bee, honey]
1    [apple, cider]
Name: text, dtype: object

or:

In [10]: df.groupby('binary').text.apply(list).reset_index()
Out[10]:
   binary            text
0       0    [bee, honey]
1       1  [apple, cider]

Upvotes: 1

Related Questions