Reputation: 588
I have a dataframe df
as defined below
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Hello Kitty how=1234 when=2345",
"how=3456 Hello Puppy when=7685",
"how=646 It is an Helloexample when=9089",
"for how=6574 stackoverflow when=5764",
"Hello when=3632 World how=7654",
],
}
)
df
Out[100]:
ID name
0 1 Hello Kitty how=1234 when=2345
1 2 how=3456 Hello Puppy when=7685
2 3 how=646 It is an Helloexample when=9089
3 4 for how=6574 stackoverflow when=5764
4 5 Hello when=3632 World how=7654
I want to extract the values written that are after how
and when
into two separate columns how and when. How can I do the same using regular expression ?
For example: in first record I should get 1234
in column how
and 2345
in column when
. In last record I should get 7654
in column how
and 3632
in column when
Upvotes: 1
Views: 1494
Reputation: 31011
Use df.name.str.extract(...). The first argument in this method is pattern. Include there two named capturing groups, for each fragment to capture.
Something like:
df.name.str.extract(r'(?P<how>(?<=how=)[\d.]+)|(?P<when>(?<=when=)[\d.]+)')
The pattern should be passed as a raw string, due to contained backslashes.
Upvotes: 0
Reputation: 82815
Using str.extract
Ex:
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Hello Kitty how=1234 when=2345",
"how=3456 Hello Puppy when=7685",
"how=646 It is an Helloexample when=9089",
"for how=6574 stackoverflow when=5764",
"Hello when=3632 World how=7654",
],
}
)
df['when'] = df['name'].str.extract(r"when=(\w+)") #If only int use `(\d+)`
df['how'] = df['name'].str.extract(r"how=(\w+)") #If only int use `(\d+)`
print(df)
Output:
ID name when how
0 1 Hello Kitty how=1234 when=2345 2345 1234
1 2 how=3456 Hello Puppy when=7685 7685 3456
2 3 how=646 It is an Helloexample when=9089 9089 646
3 4 for how=6574 stackoverflow when=5764 5764 6574
4 5 Hello when=3632 World how=7654 3632 7654
Upvotes: 3