Reputation: 1219
I have a dataframe like this:
>>> df1
overall
0 class1-10/class2-11/class3-13
1 class3-31/class2-22/class1-23
2 abc/def/xyz/prq
I want to compute 3 columns class1 , class2 & class3 if they are found in 'overall' . desired o/p
overall class1 class2 class3
0 class1-10/class2-11/class3-13 10 11 13
1 class3-31/class2-22/class1-23 23 22 32
2 abc/def/xyz/prq NaN NaN NaN
How this can be done in pythonaic way? Thanks
Upvotes: 4
Views: 386
Reputation: 81604
It may be tempting to use str.extract
but it only matches the first match as per the docs. On the other hand, str.extractall
outputs a bit too complex dataframe to work with. We will resort to df.apply
.
import re
regex = re.compile(r'(class\d+)-(\d+)')
def func(x):
data = regex.findall(x[0])
for class_name, value in data:
df.loc[x.name, class_name] = value
df.apply(func, axis=1)
print(df)
# overall class1 class2 class3
# 0 class1-10/class2-11/class3-13 10 11 13
# 1 class3-31/class2-22/class1-23 23 22 31
# 2 abc/def/xyz/prq NaN NaN NaN
Upvotes: 3
Reputation: 164693
One way without regex is to use try
/ except
:
def splitter(x):
try:
return [int(i.split('-')[1]) for i in sorted(x.split('/'))]
except IndexError:
return [np.nan] * 3
df[['class1', 'class2', 'class3']] = df['overall'].apply(splitter).apply(pd.Series)
print(df)
overall class1 class2 class3
0 class1-10/class2-11/class3-13 10.0 11.0 13.0
1 class3-31/class2-22/class1-23 23.0 22.0 31.0
2 abc/def/xyz/prq NaN NaN NaN
Upvotes: 1
Reputation: 9081
Use -
def split_cols(x):
for item in x['overall'].split('/'):
if item.startswith('class'):
pairs = item.split('-')
x[pairs[0]] = pairs[1]
return x
df.apply(split_cols, axis=1)
Output
class1 class2 class3 overall
0 10 11 13 class1-10/class2-11/class3-13
1 23 22 31 class3-31/class2-22/class1-23
2 NaN NaN NaN abc/def/xyz/prq
Explanation
The split_cols()
function takes care of creating the extra columns.
It splits by /
first, checks for presence of class
in the splits
It then splits again with -
, makes a column with the first split and the value for that column as the second split.
The whole thing then is put through the apply
function
Upvotes: 1