How to split/extract a new column and remove the extracted string from the column

Question

I have a sample dataframe

data = {"col1" : ["1 first 1", "2 second 2", "third 3", "4 fourth 4"]}

df = pd.DataFrame(data)

print(df)


     col1
0   1 first 1
1   2 second 2
2     third 3
3   4 fourth 4

I want to extract the first digit in the column and remove them

I tried to extract using

df["index"] = df["col1"].str.extract('(\d)')

    col1       index
0   1 first 1   1
1   2 second 2  2
2   third 3     3
3   4 fourth 4  4

I want to remove the extracted digit from col1 if I use replace both the start and end digits will be replaced.

Desired Output

    col1    index
0   first 1     1
1   second 2    2
2   third 3     NaN
3   fourth 4    4

jezrael · Accepted Answer

Use Series.str.replace with Series.str.extract with DataFrame.assign for processing each column separately:

#added ^ for start of string
pat = '(^\d)'
df = df.assign(col1 = df["col1"].str.replace(pat, '', regex=True),
               index= df["col1"].str.extract(pat))
print (df)
        col1 index
0    first 1     1
1   second 2     2
2    third 3   NaN
3   fourth 4     4

How to split/extract a new column and remove the extracted string from the column

Answers (2)

Related Questions