Reputation:
I have this csv with a column containing a mix of string and integer types. (ie, 6 Years and 12 Months). I am trying to find a way to convert the 'years' and 'months' into a new array containing just the months.
YrsAndMonths=np.array(['6 Years and 12 Months','7 Years and 8 Months','2 Years'])
I am trying to get an output of something like Months=['84','92','24'] Not really sure how to proceed from here.
Upvotes: 1
Views: 100
Reputation: 1625
Below code does this using list comprehension:
YrsAndMonths=np.array(['6 Years and 12 Months','7 Years and 8 Months','2 Years'])
[ str((int(i[0]) * 12) + int(i.split('and')[-1].split('Months')[0]) ) if i.find('and') > -1 else str(int(i[0])*12) for i in YrsAndMonths ]
Output:
Upvotes: 0
Reputation: 260490
This falls a bit outside of what numpy can do natively.
You could however use pandas that is built on top of numpy and will thus also enable vectorial operations:
import re
import pandas as pd
out = (pd
# extract years and months values independently
.Series(YrsAndMonths).str.extractall('(\d+)\s*year|(\d+)\s*month', flags=re.I)
.astype(float) # convert string to float
.groupby(level=0).sum() # sum per original row
.mul([12, 1]) # multiply years by 12
.sum(axis=1).astype(int) # sum and convert to numpy array
.to_numpy()
)
output: array([84, 92, 24])
Upvotes: 1
Reputation: 282
There is a specific approach that should work with the pattern of your sentences:
sentences = ['6 Years and 12 Months','7 Years and 8 Months','2 Years']
res = []
for x in [sentence.lower() for sentence in sentences]:
local_res = 0
if "year" in x:
year = x.split("year")
cnt = year[0]
local_res += int(cnt) * 12
if "month" in x:
month = x.split("month")[0].split("and")[1].strip()
cnt = month
local_res += int(cnt)
res.append(local_res)
print(res)
Upvotes: 1