Reputation: 4498
I created a dataframe df where I have a column with the following values:
category
20150115_Holiday_HK_Misc
20150115_Holiday_SG_Misc
20140116_DE_ProductFocus
20140116_UK_ProductFocus
I want to create 3 new columns
category | A | B | C
20150115_Holiday_HK_Misc 20150115_Holiday_Misc HK Holiday_Misc
20150115_Holiday_SG_Misc 20150115_Holiday_Misc SG Holiday_Misc
20140116_DE_ProductFocus 20140116_ProductFocus DE ProductFocus
20140116_UK_ProductFocus 20140116_ProductFocus UK ProductFocus
In column A, I want to take out "_HK" - I think I need to manually code this, but this is fine, I have the list of all country codes
In column B, it's that very country code
Column C, is column A without the date in the beginning
I am trying something like this, but not getting far.
df['B'] = np.where([df['category'].str.contains("HK")==True], 'HK', 'Not Specified')
Thank you
Upvotes: 1
Views: 1178
Reputation: 210832
you can use Series.str.extract() method:
# remove two characters (Country Code) surrounded by '_'
df['A'] = df.category.str.replace(r'_\w{2}_', '_')
# extract two characters (Country Code) surrounded by '_'
df['B'] = df.category.str.extract(r'_(\w{2})_', expand=False)
df['C'] = df.A.str.extract(r'\d+_(.*)', expand=False)
Result:
In [148]: df
Out[148]:
category A B C
0 20150115_Holiday_HK_Misc 20150115_Holiday_Misc HK Holiday_Misc
1 20150115_Holiday_SG_Misc 20150115_Holiday_Misc SG Holiday_Misc
2 20140116_DE_ProductFocus 20140116_ProductFocus DE ProductFocus
3 20140116_UK_ProductFocus 20140116_ProductFocus UK ProductFocus
Upvotes: 5
Reputation: 5945
You can also use regex and apply
import re
df['A'] = df.category.apply(lambda x:re.sub(r'(.*)_(\w\w)_(.*)', r'\1_\3', x))
df['B'] = df.category.apply(lambda x:re.sub(r'(.*)_(\w\w)_(.*)', r'\2', x))
df['C'] = df.A.apply(lambda x:re.sub(r'(\d+)_(.*)', r'\2', x))
Result
category A B C
0 20150115_Holiday_HK_Misc 20150115_Holiday_Misc HK Holiday_Misc
1 20150115_Holiday_SG_Misc 20150115_Holiday_Misc SG Holiday_Misc
2 20140116_DE_ProductFocus 20140116_ProductFocus DE ProductFocus
3 20140116_UK_ProductFocus 20140116_ProductFocus UK ProductFocus
Upvotes: 1