A.Rahman Mahmoud
A.Rahman Mahmoud

Reputation: 328

concatenate text from a list based on another list

I have two lists, one of them is some lines, and the other is some values for these lines as follows:

text = ['Hello, ','I need some help in here ','things are not working well ','so i posted ','this question here ','hoping to get some ','good ','answers ','out of you ','that\'s it ','thanks']
value = [1,1,0,1,1,1,0,1,1,0,1]

Goal is to concatenate lines that meet the value 1 continuously, to get this result in any way possible:

['Hello ,I need some help in here ',
 'so i posted this question here hoping to get some',
 'answers out of you ',
 'thanks']

I tried to put it as a DataFrame but I then didn't know how to go on (using pandas in solution is not a must)

print(pd.DataFrame(data={"text":text,"value":value}))
                            text  value
0                        Hello,       1
1      I need some help in here       1
2   things are not working well       0
3                   so i posted       1
4            this question here       1
5            hoping to get some       1
6                          good       0
7                       answers       1
8                    out of you       1
9                     that's it       0
10                        thanks      1

Waiting for some Answers

Upvotes: 0

Views: 89

Answers (3)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

If you want a pandas approach you can use pandas.Series.cumsum, pandas.DataFrame.groupby, df.groupby.transform and aggregate by str.join, and then access indices where value is 1:

>>> df.groupby(
            df['value'].ne(df['value'].shift(1)
            ).cumsum()
        ).transform(' '.join)[df['value'].eq(1)].drop_duplicates()
                                                 text
0                     Hello, I need some help in here
3   so i posted this question here hoping to get some
7                                  answers out of you
10                                             thanks

EXPLANATION

>>> df['value'].ne(df['value'].shift(1)).cumsum()
0     1
1     1
2     2
3     3
4     3
5     3
6     4
7     5
8     5
9     6
10    7
Name: value, dtype: int32

>>> df.groupby(df['value'].ne(df['value'].shift(1)).cumsum()).transform(' '.join)
                                                 text
0                     Hello, I need some help in here
1                     Hello, I need some help in here
2                         things are not working well
3   so i posted this question here hoping to get some
4   so i posted this question here hoping to get some
5   so i posted this question here hoping to get some
6                                                good
7                                  answers out of you
8                                  answers out of you
9                                           that's it
10                                             thanks

If you don't need a dataframe, you can use itertools.groupby over zipped values of (text, value) and groupby the second element, i.e. value. Then str.join groups' text part if key == 1.

>>> from itertools import groupby
>>> [' '.join([*zip(*g)][0]) for k, g in groupby(zip(text, value), lambda x: x[1]) if k]
['Hello,  I need some help in here ',
 'so i posted  this question here  hoping to get some ',
 'answers  out of you ',
 'thanks']

Upvotes: 1

unbe_ing
unbe_ing

Reputation: 195

Solution

Using pythonic without using pandas:

text_ = [text[count] for count, n in enumerate(value) if n == 1]

Description

This will take the list item in text at the count in the for loop if the list item in value equals 1.

Output

['Hello, ', 'I need some help in here ', 'so i posted ', 'this question here ', 'hoping to get some ', 'answers ', 'out of you ', 'thanks']

Upvotes: 0

Tom Chen
Tom Chen

Reputation: 265

There is no need to use Pandas:

tmp_str = ""
results = []
for chuck, is_evolved in zip(text, value):
    if is_evolved:
        tmp_str += chuck
    else:
        results.append(tmp_str)
        tmp_str = ""
if tmp_str:
    results.append(tmp_str)
print(results)
        

Upvotes: 2

Related Questions