Jeff Magouirk
Jeff Magouirk

Reputation: 19

Create another numpy.array in a pandas data based upon conditionals

I have a dataframe df_a, with a numpy-array named 'Language'. I want to create another numpy-array, LanguageCode, based upon Language and the Language codes associated with a Language.

df_a = pd.DataFrame({'Language':[['cantonese', 'japanese', 
                 'mandarin','american'],['mandarin','english'], 
                 ['american', 'mandarin','cantonese']]})```

df_a

     Language                                  LangugeCode
0   [cantonese, japanese, mandarin, american]  [zh_yue,ja,cmn,us]
1   [mandarin, english]                        [cmn,en]
2   [american, mandarin, cantonese]            [us,cmn,zh_yue'

Upvotes: 1

Views: 33

Answers (1)

O Pardal
O Pardal

Reputation: 672

I assumed that you have a dictionary to associate language and language code, and then used map.

Please, check if it helps you:

Assumptions:

import pandas as pd
import numpy as np

df_a = pd.DataFrame({'Language':[['cantonese', 'japanese', 
                 'mandarin','american'],['mandarin','english'], 
                 ['american', 'mandarin','cantonese']]})

#this is the hypothetical dictionary
lang_codes = {'cantonese': 'zh_yue','japanese': 'ja', 'mandarin': 'cmn','american': 'us','english': 'en'}

What you can do:

df_a['Language Code'] = [list(map(lambda x: lang_codes[x], row)) for row in df_a.Language]

Checking:

#getting the numpy array format
language_code = np.array(df_a['Language Code'])

type(language_code)

numpy.ndarray

And your dataframe will be:

    Language                                    Language Code
0   [cantonese, japanese, mandarin, american]   [zh_yue, ja, cmn, us]
1   [mandarin, english]                         [cmn, en]
2   [american, mandarin, cantonese]             [us, cmn, zh_yue]

Upvotes: 1

Related Questions