Reputation: 679
I'm experimenting/learning Python with a data set containing information on companies.
The DataFrame structure is the following (these are made up records):
import pandas as pd
df = pd.DataFrame({'key': [111, 222, 333, 444, 555, 666, 777, 888, 999],
'left_name' : ['ET CETERA SYSTEMS', 'ODDS AND ENDS', 'MAXIMA COMPANY', 'MUSIC MANY',
'GRAPHIC MASTER', 'ARC SECURITY', 'MINDNSOLES', 'REX ENERGY', 'THESIS COMPANY'],
'right_name' : ['ET CETERA SYS', 'ODDSNENDS', 'MAX COMP', 'MUSICMANY', 'GRAPHIC MSTR',
'ARC SECU', 'MIND AND SOLES', 'REXX', 'THESIS COMP']})
print(df)
key left_name right_name
0 111 ET CETERA SYSTEMS ET CETERA SYS
1 222 ODDS AND ENDS ODDSNENDS
2 333 MAXIMA COMPANY MAX COMP
3 444 MUSIC MANY MUSICMANY
4 555 GRAPHIC MASTER GRAPHIC MSTR
5 666 ARC SECURITY ARC SECU
6 777 MINDNSOLES MIND AND SOLES
7 888 REX ENERGY REXX
8 999 THESIS COMPANY THESIS COMP
My goal is to compare the acronyms of each (left_name, right_name)
pair. Specifically, if the abbreviated string formed by the concatenation of the initial letters of left_name
is equal to the abbreviated string formed by the concatenation of the initial letters of right_name
, then return a flag of 1
. Else, return 0
.
For instance, if we compare the first two abbreviated pairs, then:
ECS == ECS
→ 1
OAE != O
→ 0
Visually, the resulting DataFrame I'm looking for should look like this:
key left_name right_name name_flag
0 111 ET CETERA SYSTEMS ET CETERA SYS 1
1 222 ODDS AND ENDS ODDSNENDS 0
2 333 MAXIMA COMPANY MAX COMP 1
3 444 MUSIC MANY MUSICMANY 0
4 555 GRAPHIC MASTER GRAPHIC MSTR 1
5 666 ARC SECURITY ARC SECU 1
6 777 MINDNSOLES MIND AND SOLES 0
7 888 REX ENERGY REXX 0
8 999 THESIS COMPANY THESIS COMP 1
I think my question is closely related to this one: Upper case first letter of each word in a phrase
Unfortunately, I wasn't able to adapt the code appropriately for my problem. Any additional help would be greatly appreciated.
Upvotes: 3
Views: 484
Reputation: 296
You get get this with
df['name_flag'] = df.apply(lambda x:x.left_name.startswith(x.right_name),axis=1).map({True:1,False:0})
Upvotes: 1
Reputation: 4618
def abbr(x):
return ''.join([letter[0] for letter in x.split(' ')])
df['name_flag'] = (df['left_name'].apply(abbr) == df['right_name'].apply(abbr)).astype(int)
output:
0 1
1 0
2 1
3 0
4 1
5 1
6 0
7 0
8 1
''.join(re.findall(r'^[A-Z]|\s[A-Z]',s)).replace(' ','')
or
''.join(re.findall(r'\b\w',s))
also works in the function
Upvotes: 3
Reputation: 25239
Try this:
l = df.left_name.str.findall(r'\b\w')
r = df.right_name.str.findall(r'\b\w')
df['name_flag'] = (l == r).astype(int)
Out[366]:
key left_name right_name name_flag
0 111 ET CETERA SYSTEMS ET CETERA SYS 1
1 222 ODDS AND ENDS ODDSNENDS 0
2 333 MAXIMA COMPANY MAX COMP 1
3 444 MUSIC MANY MUSICMANY 0
4 555 GRAPHIC MASTER GRAPHIC MSTR 1
5 666 ARC SECURITY ARC SECU 1
6 777 MINDNSOLES MIND AND SOLES 0
7 888 REX ENERGY REXX 0
8 999 THESIS COMPANY THESIS COMP 1
Upvotes: 3
Reputation: 2838
This will do the job
def get_acronym(phrase):
words = phrase.split(' ')
return ''.join(w[0] for w in words)
df['name_flag'] = df.right_name.map(get_acronym) == df.left_name.map(get_acronym)
df['name_flag'] = df['name_flag'].astype(int)
df
output
key left_name right_name name_flag
0 111 ET CETERA SYSTEMS ET CETERA SYS 1
1 222 ODDS AND ENDS ODDSNENDS 0
2 333 MAXIMA COMPANY MAX COMP 1
3 444 MUSIC MANY MUSICMANY 0
4 555 GRAPHIC MASTER GRAPHIC MSTR 1
5 666 ARC SECURITY ARC SECU 1
6 777 MINDNSOLES MIND AND SOLES 0
7 888 REX ENERGY REXX 0
8 999 THESIS COMPANY THESIS COMP 1
Upvotes: 3