user19443053
user19443053

Reputation:

Remove string from array using numpy

I have this csv with a column containing a mix of string and integer types. (ie, 6 Years and 12 Months). I am trying to find a way to convert the 'years' and 'months' into a new array containing just the months.

YrsAndMonths=np.array(['6 Years and 12 Months','7 Years and 8 Months','2 Years'])

I am trying to get an output of something like Months=['84','92','24'] Not really sure how to proceed from here.

Upvotes: 1

Views: 100

Answers (3)

Abhishek
Abhishek

Reputation: 1625

Below code does this using list comprehension:

YrsAndMonths=np.array(['6 Years and 12 Months','7 Years and 8 Months','2 Years'])

[ str((int(i[0]) * 12) + int(i.split('and')[-1].split('Months')[0]) ) if i.find('and') > -1 else str(int(i[0])*12) for i in YrsAndMonths  ]

Output:

enter image description here

Upvotes: 0

mozway
mozway

Reputation: 260490

This falls a bit outside of what can do natively.

You could however use that is built on top of numpy and will thus also enable vectorial operations:

import re
import pandas as pd

out = (pd
  # extract years and months values independently
 .Series(YrsAndMonths).str.extractall('(\d+)\s*year|(\d+)\s*month', flags=re.I)
 .astype(float)           # convert string to float
 .groupby(level=0).sum()  # sum per original row
 .mul([12, 1])            # multiply years by 12
 .sum(axis=1).astype(int) # sum and convert to numpy array
 .to_numpy()
)

output: array([84, 92, 24])

Upvotes: 1

imM4TT
imM4TT

Reputation: 282

There is a specific approach that should work with the pattern of your sentences:

sentences = ['6 Years and 12 Months','7 Years and 8 Months','2 Years']
res = []

for x in [sentence.lower() for sentence in sentences]:
    local_res = 0
    if "year" in x:
        year = x.split("year")
        cnt = year[0]
        local_res += int(cnt) * 12
    if "month" in x:
        month = x.split("month")[0].split("and")[1].strip()
        cnt = month
        local_res += int(cnt)
    res.append(local_res)
    
print(res)

Upvotes: 1

Related Questions