Reputation: 309
I am having a list of texts which is 90% in format AABBB-CCCDDD001. And there are also few texts in this list which may consist of
AABBBICS-CCCDDD001 or
AABBBIGW-CCCDDD001 or
AABBBRTL-CCCDDD001 or
AABBBTDZ-CCCDDD001
These are the name of the devices where
AA - country code
BBB - site code
CCC - Function code
DDD - Sub Function code.
It could be for example: USNYCRTL-LANDCE001
If the code ICS, IGW, RTL or TDZ either matches with that in the text, I want it to output their respective number, for which I have created a dictionary:
ENVIRONMENTCODE = {
'ICS': '1',
'IGW': '2',
'RTL': '3',
'TDZ': '4'
}
NULLCODE = {
'NULL': '9'
}
So, if the text is:
AABBBICS-CCCDDD001 it should print '1' or
AABBBIGW-CCCDDD001 it should print '2' or
AABBBRTL-CCCDDD001 it should print '3' or
AABBBTDZ-CCCDDD001 it should print '4'
above example: USNYCRTL-LANDCE001 should print '3' since RTL correspondes to number '3' in dictionary
Now, for the 90% of text which is in format AABBB-CCCDDD001 should print '9' as it should pair with the key 'NULL'. Also, there maybe few texts where it can have AABBBXYZ-CCCDDD001, But we need to ignore that XYZ as it is not in the dictionary and consider only the ones that are in the dictionary. And mark that text as '9' as well.
I know regex can be used here, but I'm in the early stages of learning python and Regex seems to be out of reach for me right now. This is what I have tried so far:
def environmentcode(self):
idx = self.name.find('-')
if idx > -1:
if self.name in ENVIRONMENTCODE:
return ENVIRONMENTCODE
else:
return NULLCODE
else:
return "Not Found"
It is printing the NULLCODE dictionary only regardless of the keys are there in the text or not. Can anyone please help me with this.
Upvotes: 2
Views: 116
Reputation: 2407
My proposal:
def environmentcode(s):
if "-" not in s: #(**)
return None #(**)
h,t=s.split("-")
code=h.strip()[5:]
return ENVIRONMENTCODE.get(code,9)
data="AABBBICS-CCCDDD001 AABBBIGW-CCCDDD001 AABBBRTL-CCCDDD001 AABBBTDZ-CCCDDD001 USNYCRTL-LANDCE001 AABBB-CCCDDD001 something"
for s in data.split():
print(s,"-->",environmentcode(s))
Output:
AABBBICS-CCCDDD001 --> 1
AABBBIGW-CCCDDD001 --> 2
AABBBRTL-CCCDDD001 --> 3
AABBBTDZ-CCCDDD001 --> 4
USNYCRTL-LANDCE001 --> 3
AABBB-CCCDDD001 --> 9
something --> None
#---------------------------------------------------------
# Filtering text with regex. In this case, (**) not needed.
text="""AABBBICS-CCCDDD001 Alice was beginning to get very tired of sitting by her sister on the bank... AABBBIGW-CCCDDD001 AABBBRTL-CCCDDD001 AABBBTDZ-CCCDDD001 USNYCRTL-LANDCE001 AABBB-CCCDDD001 AABBBXYZ-CCCDDD001 something"""
import re
data= re.findall(r"\b[A-Z]{5,8}-[A-Z]{6}001\b",text)
for s in data:
print(s,"-->",environmentcode(s))
Upvotes: 0
Reputation: 55469
We can use .find
to get the code word, if it exists, and then use the dictionary to map the code word to its code number. We can use the dictionary .get
method to return the null code for missing or unknown code words. This version returns None
if it encounters bad data: a name that doesn't contain '-'
, or a name that doesn't have either 8 or 5 letters before the '-'
.
env_code = {
'ICS': '1',
'IGW': '2',
'RTL': '3',
'TDZ': '4',
}
null_code = '9'
def get_env_code(name):
idx = name.find('-')
if idx == 8:
# code may be valid
code = name[idx-3:idx]
elif idx == 5:
# code is missing
code = ''
else:
# Bad name
return None
return env_code.get(code, null_code)
# test
data = [
'AABBBICS-CCCDDD001',
'AABBBIGW-CCCDDD001',
'AABBBRTL-CCCDDD001',
'AABBBTDZ-CCCDDD001',
'USNYCRTL-LANDCE001',
'AABBBXYZ-CCCDDD001',
'AABBB-CCCDDD001',
'BADDATA',
]
for s in data:
print(s, get_env_code(s))
output
AABBBICS-CCCDDD001 1
AABBBIGW-CCCDDD001 2
AABBBRTL-CCCDDD001 3
AABBBTDZ-CCCDDD001 4
USNYCRTL-LANDCE001 3
AABBBXYZ-CCCDDD001 9
AABBB-CCCDDD001 9
BADDATA None
Here's a simpler version that returns the null code instead of None
for bad data.
def get_env_code(name):
idx = name.find('-')
code = name[idx-3:idx] if idx == 8 else ''
return env_code.get(code, null_code)
Upvotes: 1
Reputation: 1319
If you're just checking if a member of ENVIRONMENTCODE
is found within each test string, then regex not necessary. You can just use the python keyword in
, e.g.
ENVIRONMENTCODE = {
'ICS': '1',
'IGW': '2',
'RTL': '3',
'TDZ': '4'
}
NULLCODE = {
'NULL': '9'
}
def environment_code(test_string, code_dict):
if '-' not in test_string:
return 'no dash'
for code, value in code_dict.items():
if code in test_string:
return value
return NULLCODE['NULL']
to_test = ['AABBBICS-CCCDDD001',
'AABBBIGW-CCCDDD001',
'AABBBRTL-CCCDDD001',
'AABBBTDZ-CCCDDD001']
for test_str in to_test:
print(environment_code(test_str, ENVIRONMENTCODE))
The problem with your original code was that you were trying to do
test_string in code_dict
which only checks for exact matches between the string under test and the keys withint the dictionary.
Upvotes: 0