Reputation: 117
Considering a data like this one:
df = pd.DataFrame({'Log': ["Msadr#3 <-CmdS='LinkSelect'", "ErrCommPortOpen [MSADR#4-N]", "a"]})
df['Aux Col'] = df['Log'].str.lower().str.find('msadr') + 6
I want to get the number that comes after "msadr", when it does, which is 6 characters after the str.find
position. If it doesn't exist, str.find
will return -1 and the value on 'Aux Col' will be 5.
So, for the cases in which df['Aux Col'] isn't 5, I'm trying to get the "df['Aux']-th character" in df['Log'].
However, when I try:
df.loc[df['Aux Col'] != 5, "#"] = df['Log'].str[df['Aux Col']]
It returns the following error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Why is it ambiguous and how can I fix it?
The expected result is
df = pd.DataFrame({'Log': ["Msadr#3 <-CmdS='LinkSelect'", "ErrCommPortOpen [MSADR#4-N]", "a"], '#': ['3','4','NaN']})
Upvotes: 0
Views: 502
Reputation: 92894
The exact expected result:
df['#'] = df['Log'].str.extract(r'(?<=msadr#)(\d+)', flags=re.I, expand=False)
In [27]: df
Out[27]:
Log #
0 Msadr#3 <-CmdS='LinkSelect' 3
1 ErrCommPortOpen [MSADR#4-N] 4
2 a NaN
Upvotes: 1
Reputation: 323376
Seems like
df['Log'].str.lower().str.extract('(?:[msadr#](\d+))')
Out[139]:
0
0 3
1 4
2 NaN
To fix your code
[x[y:y+1] for x ,y in zip(df['Log'],df['Aux Col'])]
Upvotes: 3