Ailurophile
Ailurophile

Reputation: 3005

How to split/extract a new column and remove the extracted string from the column

I have a sample dataframe

data = {"col1" : ["1 first 1", "2 second 2", "third 3", "4 fourth 4"]}

df = pd.DataFrame(data)
print(df)


     col1
0   1 first 1
1   2 second 2
2     third 3
3   4 fourth 4

I want to extract the first digit in the column and remove them

I tried to extract using

df["index"] = df["col1"].str.extract('(\d)')
    col1       index
0   1 first 1   1
1   2 second 2  2
2   third 3     3
3   4 fourth 4  4

I want to remove the extracted digit from col1 if I use replace both the start and end digits will be replaced.

Desired Output

    col1    index
0   first 1     1
1   second 2    2
2   third 3     NaN
3   fourth 4    4

Upvotes: 2

Views: 107

Answers (2)

Hamza usman ghani
Hamza usman ghani

Reputation: 2243

Use regex pattern '^(\d)' which means you want to access one digit at the start of string.

  • ^ refers to the start of string.
  • \d means one digit
df["index"] = df.col1.str.extract("^(\d)")
df.col1 = df.col1.str.replace('^(\d)',"",regex = True)

print(df)

      col1   index
0    first 1     1
1   second 2     2
2    third 3   NaN
3   fourth 4     4

Upvotes: 0

jezrael
jezrael

Reputation: 862611

Use Series.str.replace with Series.str.extract with DataFrame.assign for processing each column separately:

#added ^ for start of string
pat = '(^\d)'
df = df.assign(col1 = df["col1"].str.replace(pat, '', regex=True),
               index= df["col1"].str.extract(pat))
print (df)
        col1 index
0    first 1     1
1   second 2     2
2    third 3   NaN
3   fourth 4     4

Upvotes: 4

Related Questions