thenoirlatte
thenoirlatte

Reputation: 349

How to split a list in a column into two column in a dataframe using python?

How to split a list in a column into two column in a dataframe using python? For example:

  row  |  column_A                
  ==================================
  1    |[('Ahli', 'NNP'),          |
       | ('paleontologi', 'NNP'),  | 
       | ('Thomas', 'NNP'),        |
       | ('dan', 'CC'),            |
       | ('timnya', 'RB'),         |
       | ('.', 'Z')],              |
  2    |[('fosil', 'NN'),          |
       | ('mamalia', 'NN'),        |
       | ('yang', 'SC'),           |
       | ('menghuni', 'VB'),       |
       | ('Antartika', 'NNP')]     |

I want to get only the secord string from the list:

  row  |  column_A                 | postag
  =======================================
  1    |[('Ahli', 'NNP'),          |[('NNP'),
       | ('paleontologi', 'NNP'),  | (NNP),
       | ('Thomas', 'NNP'),        | (NNP),
       | ('dan', 'CC'),            | (CC),
       | ('timnya', 'RB'),         | (RB),
       | ('.', 'Z')],              | (Z)],
  2    |[('fosil', 'NN'),          |[('NN'),
       | ('mamalia', 'NN'),        | ('NN'), 
       | ('yang', 'SC'),           | ('SC),
       | ('menghuni', 'VB'),       | ('VB'),
       | ('Antartika', 'NNP')]     | ('NNP)]

Upvotes: 1

Views: 124

Answers (4)

Sebastien D
Sebastien D

Reputation: 4482

You could achieve this with the following apply function:

data = [{'column_A': [('Ahli', 'NNP'),
        ('paleontologi', 'NNP'),
        ('Thomas', 'NNP'),
        ('dan', 'CC'),
        ('timnya', 'RB'),
        ('.', 'Z')]},
        {'column_A': [('fosil', 'NN'),
        ('mamalia', 'NN'),
        ('yang', 'SC'),
        ('menghuni', 'VB'),
        ('Antartika', 'NNP')]}]

df = pd.DataFrame(data)
df['postag'] = df['column_A'].apply(lambda x : [y[1] for y in x])
df

Output:

    column_A                                            postag
0   [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...   [NNP, NNP, NNP, CC, RB, Z]
1   [(fosil, NN), (mamalia, NN), (yang, SC), (meng...   [NN, NN, SC, VB, NNP]

Upvotes: 1

Biranchi
Biranchi

Reputation: 16327

Try using the apply function on the exiting column to get a new column with the desired result

Example Pseudocode:

df['postag'] = df['column_A'].apply(your_function)

In the your_function, write your logic for separating the pos tags from the list of tuples.

Upvotes: 1

Ajay A
Ajay A

Reputation: 1068

Adding to @Biranchi's answer, the correct answer would be

df['postag'] = df['column_A'].apply(lambda x: [(i[1],) for i in x])

Result would be

# print(df)

                                        column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [(NNP,), (NNP,), (NNP,), ...

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use, Series.map to apply a custom mapping function which maps each of the list in column_A according to the desired requirements:

df['postag'] = df['column_A'].map(lambda l: [b for a, b in l])

Another possible idea:

df['postag'] = [[y for x, y in lst] for lst in df['column_A']]

Result:

# print(df)

                                            column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [NNP, NNP, NNP, CC, RB, Z]
1  [(fosil, NN), (mamalia, NN), (yang, SC), (meng...       [NN, NN, SC, VB, NNP]

Upvotes: 1

Related Questions