Bharat Sharma
Bharat Sharma

Reputation: 1219

Compute new column based on keyword and split

I have a dataframe like this:

>>> df1
                         overall
0  class1-10/class2-11/class3-13
1  class3-31/class2-22/class1-23
2                abc/def/xyz/prq

I want to compute 3 columns class1 , class2 & class3 if they are found in 'overall' . desired o/p

          overall                 class1  class2  class3
0  class1-10/class2-11/class3-13    10    11      13
1  class3-31/class2-22/class1-23    23    22      32
2                abc/def/xyz/prq     NaN  NaN     NaN

How this can be done in pythonaic way? Thanks

Upvotes: 4

Views: 386

Answers (3)

DeepSpace
DeepSpace

Reputation: 81604

It may be tempting to use str.extract but it only matches the first match as per the docs. On the other hand, str.extractall outputs a bit too complex dataframe to work with. We will resort to df.apply.

import re

regex = re.compile(r'(class\d+)-(\d+)')

def func(x):
    data = regex.findall(x[0])
    for class_name, value in data:
        df.loc[x.name, class_name] = value

df.apply(func, axis=1)
print(df)

#                           overall class1 class2 class3
#  0  class1-10/class2-11/class3-13     10     11     13
#  1  class3-31/class2-22/class1-23     23     22     31
#  2                abc/def/xyz/prq    NaN    NaN    NaN

Upvotes: 3

jpp
jpp

Reputation: 164693

One way without regex is to use try / except:

def splitter(x):
    try:
        return [int(i.split('-')[1]) for i in sorted(x.split('/'))]
    except IndexError:
        return [np.nan] * 3

df[['class1', 'class2', 'class3']] = df['overall'].apply(splitter).apply(pd.Series)

print(df)

                         overall  class1  class2  class3
0  class1-10/class2-11/class3-13    10.0    11.0    13.0
1  class3-31/class2-22/class1-23    23.0    22.0    31.0
2                abc/def/xyz/prq     NaN     NaN     NaN

Upvotes: 1

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

Use -

def split_cols(x):
    for item in x['overall'].split('/'):
        if item.startswith('class'):
            pairs = item.split('-')
            x[pairs[0]] = pairs[1]
    return x
df.apply(split_cols, axis=1)

Output

    class1  class2  class3  overall
0   10  11  13  class1-10/class2-11/class3-13
1   23  22  31  class3-31/class2-22/class1-23
2   NaN NaN NaN abc/def/xyz/prq

Explanation

The split_cols() function takes care of creating the extra columns.

It splits by / first, checks for presence of class in the splits

It then splits again with -, makes a column with the first split and the value for that column as the second split.

The whole thing then is put through the apply function

Upvotes: 1

Related Questions