Reputation: 19
I have a dataframe df_a, with a numpy-array named 'Language'. I want to create another numpy-array, LanguageCode, based upon Language and the Language codes associated with a Language.
df_a = pd.DataFrame({'Language':[['cantonese', 'japanese',
'mandarin','american'],['mandarin','english'],
['american', 'mandarin','cantonese']]})```
df_a
Language LangugeCode
0 [cantonese, japanese, mandarin, american] [zh_yue,ja,cmn,us]
1 [mandarin, english] [cmn,en]
2 [american, mandarin, cantonese] [us,cmn,zh_yue'
Upvotes: 1
Views: 33
Reputation: 672
I assumed that you have a dictionary to associate language and language code, and then used map.
Please, check if it helps you:
import pandas as pd
import numpy as np
df_a = pd.DataFrame({'Language':[['cantonese', 'japanese',
'mandarin','american'],['mandarin','english'],
['american', 'mandarin','cantonese']]})
#this is the hypothetical dictionary
lang_codes = {'cantonese': 'zh_yue','japanese': 'ja', 'mandarin': 'cmn','american': 'us','english': 'en'}
df_a['Language Code'] = [list(map(lambda x: lang_codes[x], row)) for row in df_a.Language]
#getting the numpy array format
language_code = np.array(df_a['Language Code'])
type(language_code)
numpy.ndarray
And your dataframe will be:
Language Language Code
0 [cantonese, japanese, mandarin, american] [zh_yue, ja, cmn, us]
1 [mandarin, english] [cmn, en]
2 [american, mandarin, cantonese] [us, cmn, zh_yue]
Upvotes: 1