aero8991
aero8991

Reputation: 333

extract specific data from pandas series

I have a pandas series that looks generally like this with 400k+ rows of data:

type = [['UNSPECIFIED SITE OF LEFT FEMALE BREAST', 'UPPER-OUTER QUADRANT OF RIGHT FEMALE BREAST'] ,
 'UPPER-OUTER QUADRANT OF RIGHT FEMALE BREAST',
 'UPPER-OUTER QUADRANT OF LEFT FEMALE BREAST',
 'AXILLARY TAIL OF LEFT FEMALE BREAST',
 'OVERLAPPING SITES OF LEFT FEMALE BREAST',
 'CENTRAL PORTION OF UNSPECIFIED FEMALE BREAST',
 'CENTRAL PORTION OF RIGHT FEMALE BREAST',
 'UNSPECIFIED SITE OF UNSPECIFIED FEMALE BREAST',
 'UPPER-INNER QUADRANT OF LEFT FEMALE BREAST',
 'LOWER-OUTER QUADRANT OF RIGHT FEMALE BREAST',
 'OVERLAPPING SITES OF RIGHT FEMALE BREAST',
 'UNSPECIFIED SITE OF RIGHT FEMALE BREAST',
 'UPPER-OUTER QUADRANT OF UNSPECIFIED FEMALE BREAST',
 'LOWER-INNER QUADRANT OF LEFT FEMALE BREAST',
 'NIPPLE AND AREOLA, LEFT FEMALE BREAST',
 ['NIPPLE AND AREOLA, RIGHT FEMALE BREAST', 'NIPPLE AND AREOLA, LEFT FEMALE BREAST'],
 ['LOWER-OUTER QUADRANT OF LEFT FEMALE BREAST', 'UPPER-INNER QUADRANT OF UNSPECIFIED FEMALE BREAST'],
 'UPPER-INNER QUADRANT OF UNSPECIFIED FEMALE BREAST',
 'UPPER-INNER QUADRANT OF RIGHT FEMALE BREAST',
 'AXILLARY TAIL OF RIGHT FEMALE BREAST',
 'CENTRAL PORTION OF LEFT FEMALE BREAST',
 'OVERLAPPING SITES OF UNSPECIFIED FEMALE BREAST',
 'LOWER-INNER QUADRANT OF UNSPECIFIED FEMALE BREAST',
 'NIPPLE AND AREOLA, UNSPECIFIED FEMALE BREAST',
 ['LOWER-INNER QUADRANT OF RIGHT FEMALE BREAST', 'LOWER-OUTER QUADRANT OF UNSPECIFIED FEMALE BREAST'],
 'LOWER-OUTER QUADRANT OF RIGHT FEMALE BREAST']

I want to extract location info, how would I extract specific words from the list? I want to extract the word LEFT and RIGHT and UNSPECIFIED and put them in a separate column.

Output:


result = 
[[ LEFT, RIGHT ],
RIGHT,
LEFT,
LEFT,
LEFT,
LEFT,
UNSPECIFIED,
RIGHT,
UNSPECIFIED,
LEFT,
RIGHT,
RIGHT,
RIGHT,
UNSPECIFIED,
LEFT,
LEFT,
[RIGHT, LEFT],
[LEFT, UNSPECIFIED],
UNSPECIFIED,
RIGHT,
RIGHT,
LEFT,
UNSPECIFIED,
UNSPECIFIED,
UNSPECIFIED,
[RIGHT, UNSPECIFIED],
RIGHT]

Upvotes: 0

Views: 63

Answers (1)

wwnde
wwnde

Reputation: 26676

df=pd.DataFrame(pd.Series(typed, name='Description'))
df['search']=df['Description'].str.findall('LEFT|RIGHT|UNSPECIFIED')

outcome1- dataframe

print(df)

outcome2- list

result=df['search'].to_list()

result

Upvotes: 2

Related Questions