Reputation: 1003
Given a dataframe:
text binary
1 apple 1
2 bee 0
3 cider 1
4 honey 0
I would like to get 2 lists: one = [apple cider], zero = [bee honey]
How do I join the strings in the 'text' column based on the group (1 or 0) they belong to in the column 'binary'?
I wrote for loops to check for each row if binary is 1 or 0 then proceeded to append the text in the text column to a list but I was wondering if there's a more efficient way given that in pandas, we could join texts in columns by simply calling ' '.join(df.text). But how can we do it base on a condition?
--Follow up Question --
binary text1 text2 text3
0 1 hello this table
1 1 cider that chair
2 0 bee how mouse
3 0 winter bottle fan
I would like to do the same thing but with multiple text columns.
raw = defaultdict(list)
raw['text1'] = ['hello','cider','bee','winter']
raw['text2'] = ['this','that','how','bottle']
raw['text3'] = ['table','chair','mouse','fan']
raw['binary'] = [1,1,0,0]
df= pd.DataFrame.from_dict(raw)
text1 = df.groupby('binary').text1.apply(list)
text2 = df.groupby('binary').text2.apply(list)
text3 = df.groupby('binary').text3.apply(list)
How can I write something like:
for i in ['text1','text2','text3']:
df.groupby('binary').i.apply(list)
Upvotes: 1
Views: 908
Reputation: 210882
UPDATE: Follow up Question
one list for each text*
column grouped by binary
column
In [56]: df.set_index('binary').stack().groupby(level=[0,1]).apply(list).unstack()
Out[56]:
text1 text2 text3
binary
0 [bee, winter] [how, bottle] [mouse, fan]
1 [hello, cider] [this, that] [table, chair]
one list for all text
columns grouped by binary
column
In [54]: df.set_index('binary').stack().groupby(level=0).apply(list)
Out[54]:
binary
0 [bee, how, mouse, winter, bottle, fan]
1 [hello, this, table, cider, that, chair]
dtype: object
OLD answer:
IIUC you can group by binary
and apply list
to grouped text
column:
In [8]: df.groupby('binary').text.apply(list)
Out[8]:
binary
0 [bee, honey]
1 [apple, cider]
Name: text, dtype: object
or:
In [10]: df.groupby('binary').text.apply(list).reset_index()
Out[10]:
binary text
0 0 [bee, honey]
1 1 [apple, cider]
Upvotes: 1