Pandas: How to extract a string from another string

Question

I have a column that consist of 8000 rows, and I need to create a new column which the value is extracted from the existing column.

the string shows like this:

TP-ETU06-01-525-W-133

and I want to create two new columns from the string where the value of first new column is extracted from the second string which ETU06 and the second one is from the last string which is 133.

I have done this by using:

df["sys_no"] = df.apply(lambda x:x["test_no"].split("-")[1] if (pd.notnull(x["test_no"]) and x["test_no"]!="" and len(x["test_no"].split("-"))>0) else None,axis=1)

df["package_no"] = df.apply(lambda x:x["test_no"].split("-")[-1] if (pd.notnull(x["test_no"]) and x["test_no"]!="" and len(x["test_no"].split("-"))>0) else None,axis=1)

It actually works fine, but the existing column has random string that doesn't follow the others. So I want to leave empty in the new columns if the random string appears.

How should I change my script?

Thankyou

jezrael · Accepted Answer

Use Series.str.contains for mask, then split values by Series.str.split and select secnd and last value by indexing only filtered rows by mask:

print (df)
                 test_no
0              temp data
1                    NaN
2  TP-ETU06-01-525-W-133

mask = df["test_no"].str.contains('-', na=False)
splitted = df["test_no"].str.split("-")
df.loc[mask, "sys_no"] = splitted[mask].str[1]
df.loc[mask, "package_no"] = splitted[mask].str[-1]
print (df)
                 test_no sys_no package_no
0              temp data    NaN        NaN
1                    NaN    NaN        NaN
2  TP-ETU06-01-525-W-133  ETU06        133

Pandas: How to extract a string from another string

Answers (2)

Benefit of regex over `split`:

Sample Data:

Code:

Output:

Related Questions

Pandas: How to extract a string from another string

Answers (2)

Benefit of regex over split:

Sample Data:

Code:

Output:

Related Questions

Benefit of regex over `split`: