Reputation: 3005
I have a sample dataframe
data = {"col1" : ["1 first 1", "2 second 2", "third 3", "4 fourth 4"]}
df = pd.DataFrame(data)
print(df)
col1
0 1 first 1
1 2 second 2
2 third 3
3 4 fourth 4
I want to extract the first digit
in the column and remove them
I tried to extract using
df["index"] = df["col1"].str.extract('(\d)')
col1 index
0 1 first 1 1
1 2 second 2 2
2 third 3 3
3 4 fourth 4 4
I want to remove the extracted digit from col1
if I use replace
both the start and end digits will be replaced.
Desired Output
col1 index
0 first 1 1
1 second 2 2
2 third 3 NaN
3 fourth 4 4
Upvotes: 2
Views: 107
Reputation: 2243
Use regex
pattern '^(\d)' which means you want to access one digit at the start of string.
df["index"] = df.col1.str.extract("^(\d)")
df.col1 = df.col1.str.replace('^(\d)',"",regex = True)
print(df)
col1 index
0 first 1 1
1 second 2 2
2 third 3 NaN
3 fourth 4 4
Upvotes: 0
Reputation: 862611
Use Series.str.replace
with Series.str.extract
with DataFrame.assign
for processing each column separately:
#added ^ for start of string
pat = '(^\d)'
df = df.assign(col1 = df["col1"].str.replace(pat, '', regex=True),
index= df["col1"].str.extract(pat))
print (df)
col1 index
0 first 1 1
1 second 2 2
2 third 3 NaN
3 fourth 4 4
Upvotes: 4