MaxB
MaxB

Reputation: 458

Create list from boolean expression

I have a dataframe containing a column of boolean expressions and I want to make another column that is just a list of the elements of each expression.

EX

Name          Exp
 A     DDDD | LLLL & AAAA
 D     HHHH | DDDD | JJJJ
 O     UUUU & FFFF & RRRR

result df:

Name          Exp                   Exp List
 A     DDDD | LLLL & AAAA    ['DDDD','LLLL','AAAA']
 D     HHHH | DDDD | JJJJ    ['HHHH','DDDD','JJJJ']
 O     UUUU & FFFF & RRRR    ['UUUU','FFFF','RRRR']

Upvotes: 1

Views: 86

Answers (2)

Ajay Maity
Ajay Maity

Reputation: 760

The answer by @jezrael will fail if the Exp column contains other special characters.

This implementation works if you know the boolean characters will always be either | or &:

>>> df = pd.DataFrame({'Name': ['A', 'D', 'O'],
                       'Exp': ['DDDD  | L-LL & AAAA', 'HHHH | DDDD | JJJJ', 'UUUU& FFFF & RRRR']})
>>> df

    Name    Exp
0   A       DDDD | L-LL & AAAA
1   D       HHHH | DDDD | JJJJ
2   O       UUUU & FFFF & RRRR

>>> df['Exp List'] = df['Exp'].str.split(r'\s*\||\s*&|\||\&')

>>> df

    Name    Exp                 Exp List
0   A       DDDD | L-LL & AAAA  [DDDD, L-LL, AAAA]
1   D       HHHH | DDDD | JJJJ  [HHHH, DDDD, JJJJ]
2   O       UUUU & FFFF & RRRR  [UUUU, FFFF, RRRR]

Upvotes: 1

jezrael
jezrael

Reputation: 862851

Use Series.str.findall with regex [a-zA-Z]+ for extract words:

df['Exp List'] = df['Exp'].str.findall(r'[a-zA-Z]+')
#alternative
#df['Exp List'] = df['Exp'].str.findall(r'\w+')
print (df)
  Name                 Exp            Exp List
0    A  DDDD | LLLL & AAAA  [DDDD, LLLL, AAAA]
1    D  HHHH | DDDD | JJJJ  [HHHH, DDDD, JJJJ]
2    O  UUUU & FFFF & RRRR  [UUUU, FFFF, RRRR]

Solution with Series.str.split with escaped separators with optional whitespaces is:

df['Exp List'] = df['Exp'].str.split(r'\s*\|\s*|\s*&\s*')

Upvotes: 5

Related Questions