Reputation: 864
df = pd.DataFrame({'columnA': ['apple:50-100(+)', 'peach:75-125(-)', 'banana:100-150(+)']})
New to regular expressions...if I want to split 'apple:50-100(+)'
(and other example strings above) into a DataFrame
as below, what's the best way to do that?
Desired output:
Upvotes: 2
Views: 1484
Reputation: 32244
re.split
can be used to split on any string that matches a pattern. For the example you have given the following should work
re.split(r'[\:\-\(\)]+', your_string)
It splits the string on all colons, hyphens and parenthesis (":", "-", "(" and ")")
This results in an empty string as the last member of the list, you can either slice this off
re.split(r'[\:\-\(\)]+', your_string)[:-1]
Or filter out empty values
filter(None, re.split(r'[\:\-\(\)]+', your_string))
Upvotes: 1
Reputation: 2702
I can update the regex if you provide more details on the format.
import pandas as pd
df = pd.DataFrame({'columnA': ['apple:50-100(+)', 'peach:75-125(-)', 'banana:100-150(+)']})
pattern = r"(.*):(\d+)-(\d+)\(([+-])\)"
new_df = df['columnA'].str.extract(pattern)
df
:
columnA
0 apple:50-100(+)
1 peach:75-125(-)
2 banana:100-150(+)
new_df
:
0 1 2 3
0 apple 50 100 +
1 peach 75 125 -
2 banana 100 150 +
Upvotes: 4
Reputation: 5372
Here is an alternative:
Python 3.7.5 (default, Oct 17 2019, 12:16:48)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import pandas as pd
>>> split_it = re.compile(r'(\w+):(\d+)[-](\d+)\((.)\)')
>>> df = pd.DataFrame(split_it.findall('apple:50-100(+)'))
>>> df
0 1 2 3
0 apple 50 100 +
>>>
Upvotes: 0