Reputation: 326
I have a pandas
dataframe that contains a column 'iso' containing chemical isotope symbols, such as '4He', '16O', '197Au'. I want to label many (but not all) isotopes on a plot using the annotate()
function in matplotlib
. The label format should have the atomic mass in superscript. I can do this with the LaTeX style formatting:
axis.annotate('$^{4}$He', xy=(x, y), xycoords='data')
I could write dozens of annotate()
statements like the one above for each isotope I want to label, but I'd rather automate.
How can I extract the isotope number and name from my iso column?
With those pieces extracted I can make the labels. Lets say we dump them into the variables Num
and Sym
. Now I can loop over my isotopes and do something like this:
for i in list_of_isotopes:
(Num, Sym) = df[df.iso==i].iso.str.MISSING_STRING_METHOD(???)
axis.annotate('$^{%s}$%s' %(Num, Sym), xy=(x[Num], y[Num]), xycoords='data')
Presumably, there is a pandas
string methods that I can drop into the above. But I'm having trouble coming up with a solution. I've been trying split()
and extract()
with a few different patterns, but can't get the desired effect.
Upvotes: 5
Views: 16269
Reputation: 552
The accepted answer gave me the right direction, but I think the right pandas function to use is extract. Like this only the matched regular expressions are returned, eliminating the use to slice afterwards.
df = pd.DataFrame({'iso': ['4He', '16O', '197Au']})
df[['num', 'element']] = df['iso'].str.extract('(\d+)([A-Za-z]+)', expand=True)
print(df)
gives
iso num element
0 4He 4 He
1 16O 16 O
2 197Au 197 Au
Upvotes: 1
Reputation: 21878
This is my answer using split
. The regexp used can be improved, I'm very bad at that sort of things :-)
(\d+)
stands for the integers, and ([A-Za-z]+)
stands for the strings.
df = pd.DataFrame({'iso': ['4He', '16O', '197Au']})
result = df['iso'].str.split('(\d+)([A-Za-z]+)', expand=True)
result = result.loc[:,[1,2]]
result.rename(columns={1:'x', 2:'y'}, inplace=True)
print(result)
Produces
x y
0 4 He
1 16 O
2 197 Au
Upvotes: 12
Reputation: 82
Did you tried strip()
, maybe you can consider this:
import string
for i in list_of_isotopes:
Num = df[df.iso==i].iso.str.strip(string.ascii_letters)
Sym = df[df.iso==i].iso.str.strip(string.digits)
axis.annotate('$^%s$%s' %(Num, Sym), xy=(x[Num], y[Num]), xycoords='data')
Upvotes: 0
Reputation: 8583
To extract the number and the element of an isotope symbol you can use a regular expression (short: regex) in combination with Python's re
module. The regex looks for number digits and after that it looks for characters which are grouped and accessible using the group's name. If the regex matches you can extract the data and .format()
the desired annotation string:
#!/usr/bin/env python3
# coding: utf-8
import re
iso_num = '16O'
preg = re.compile('^(?P<num>[0-9]*)(?P<element>[A-Za-z]*)$')
m = preg.match(iso_num)
if m:
num = m.group('num')
element = m.group('element')
note = '$^{}${}'.format(num, element)
# axis.annotate(note, xy=(x, y), xycoords='data')
Upvotes: 0
Reputation: 4570
I'd use simple string manipulation, without the hassle of regex.
isotopes = ['4He', '16O', '197Au']
def get_num(isotope):
return filter(str.isdigit, isotope)
def get_sym(isotope):
return isotope.replace(get_num(isotope),'')
def get_num_sym(isotope):
return (get_num(isotope),get_sym(isotope))
for isotope in isotopes:
num,sym = get_num_sym(isotope)
print num,sym
Upvotes: 0