Reputation: 1
I have a csv file with text column, PF sample data like below
text
['Hello world', 'Welcome to the universe.']
['Hey Hello world', 'I am learning Pandas Welcome to the universe.']
['Hello world how are you', 'Good Morning', 'I am learning Pandas.']
['Hi', 'Iam version 3.6 Welcome', 'Nice to meet you.']
I want to iterate each row and check for the patterns
if the sentence has a pattern Hello or world then I want to place that sentence in a new column text1
if the sentence has a pattern Welcome or universe then I want to place that sentence in a new column text2
so My output looks like below after searching for pattern and placing it in new columns
text,text1,text2
['Hello world', 'Welcome to the universe.'],Hello world,Welcome to the universe.
['Hey Hello world', 'I am learning Pandas Welcome to the universe.'],Hey Hello world,I am learning Pandas Welcome to the universe.
['Hello how are you', 'Good Morning', 'I am learning Pandas.'],Hello how are you,None
['Hi', 'Iam version 3.6 Welcome', 'Nice to meet you.'],None,Iam version 3.6 Welcome
Can anyone please Guide me how to do this?
Upvotes: 0
Views: 73
Reputation: 3455
From your DataFrame
:
>>> df = pd.DataFrame({'text': ["['Hello world', 'Welcome to the universe.']",
... "['Hey Hello world', 'I am learning Pandas Welcome to the universe.']",
... "['Hello world how are you', 'Good Morning', 'I am learning Pandas.']",
... "['Hi', 'Iam version 3.6 Welcome', 'Nice to meet you.']"]},
... index = [0, 1, 2, 3])
>>> df
text
0 ['Hello world', 'Welcome to the universe.']
1 ['Hey Hello world', 'I am learning Pandas Welc...
2 ['Hello world how are you', 'Good Morning', 'I...
3 ['Hi', 'Iam version 3.6 Welcome', 'Nice to mee...
We can apply
two functions, find_substring_text1
and find_substring_text2
on the text
column, which is eval
as a list
:
>>> def find_substring_text1(row):
... return [s for s in row if any(k in s for k in ['Hello', 'world'])]
>>> def find_substring_text2(row):
... return [s for s in row if any(k in s for k in ['Welcome', 'universe'])]
>>> df['text1'] = df['text'].apply(eval).apply(find_substring_text1)
>>> df['text2'] = df['text'].apply(eval).apply(find_substring_text2)
Then we get the expected result :
>>> df
text text1 text2
0 ['Hello world', 'Welcome to the universe.'] [Hello world] [Welcome to the universe.]
1 ['Hey Hello world', 'I am learning Pandas Welc... [Hey Hello world] [I am learning Pandas Welcome to the universe.]
2 ['Hello world how are you', 'Good Morning', 'I... [Hello world how are you] []
3 ['Hi', 'Iam version 3.6 Welcome', 'Nice to mee... [] [Iam version 3.6 Welcome]
If needed, we can even change the list
format to string
like so :
>>> df['text1'] = [','.join(map(str, l)) for l in df['text1']]
>>> df['text2'] = [','.join(map(str, l)) for l in df['text2']]
>>> df
text text1 text2
0 ['Hello world', 'Welcome to the universe.'] Hello world Welcome to the universe.
1 ['Hey Hello world', 'I am learning Pandas Welc... Hey Hello world I am learning Pandas Welcome to the universe.
2 ['Hello world how are you', 'Good Morning', 'I... Hello world how are you
3 ['Hi', 'Iam version 3.6 Welcome', 'Nice to mee... Iam version 3.6 Welcome
Upvotes: 1