Luan Vieira
Luan Vieira

Reputation: 117

How to extract text in pandas depending on value on position in string?

Considering a data like this one:

df = pd.DataFrame({'Log': ["Msadr#3 <-CmdS='LinkSelect'", "ErrCommPortOpen [MSADR#4-N]", "a"]})
df['Aux Col'] = df['Log'].str.lower().str.find('msadr') + 6

enter image description here

I want to get the number that comes after "msadr", when it does, which is 6 characters after the str.find position. If it doesn't exist, str.find will return -1 and the value on 'Aux Col' will be 5.

So, for the cases in which df['Aux Col'] isn't 5, I'm trying to get the "df['Aux']-th character" in df['Log'].

However, when I try:

df.loc[df['Aux Col'] != 5, "#"] = df['Log'].str[df['Aux Col']]

It returns the following error message:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Why is it ambiguous and how can I fix it?

The expected result is

df = pd.DataFrame({'Log': ["Msadr#3 <-CmdS='LinkSelect'", "ErrCommPortOpen [MSADR#4-N]", "a"], '#': ['3','4','NaN']})

Upvotes: 0

Views: 502

Answers (2)

RomanPerekhrest
RomanPerekhrest

Reputation: 92894

The exact expected result:

df['#'] = df['Log'].str.extract(r'(?<=msadr#)(\d+)', flags=re.I, expand=False) 

In [27]: df                                                                                                       
Out[27]: 
                           Log    #
0  Msadr#3 <-CmdS='LinkSelect'    3
1  ErrCommPortOpen [MSADR#4-N]    4
2                            a  NaN

Upvotes: 1

BENY
BENY

Reputation: 323376

Seems like

df['Log'].str.lower().str.extract('(?:[msadr#](\d+))')
Out[139]: 
     0
0    3
1    4
2  NaN

To fix your code

[x[y:y+1] for x ,y in zip(df['Log'],df['Aux Col'])]

Upvotes: 3

Related Questions