Reputation: 349
How to split a list in a column into two column in a dataframe using python? For example:
row | column_A
==================================
1 |[('Ahli', 'NNP'), |
| ('paleontologi', 'NNP'), |
| ('Thomas', 'NNP'), |
| ('dan', 'CC'), |
| ('timnya', 'RB'), |
| ('.', 'Z')], |
2 |[('fosil', 'NN'), |
| ('mamalia', 'NN'), |
| ('yang', 'SC'), |
| ('menghuni', 'VB'), |
| ('Antartika', 'NNP')] |
I want to get only the secord string from the list:
row | column_A | postag
=======================================
1 |[('Ahli', 'NNP'), |[('NNP'),
| ('paleontologi', 'NNP'), | (NNP),
| ('Thomas', 'NNP'), | (NNP),
| ('dan', 'CC'), | (CC),
| ('timnya', 'RB'), | (RB),
| ('.', 'Z')], | (Z)],
2 |[('fosil', 'NN'), |[('NN'),
| ('mamalia', 'NN'), | ('NN'),
| ('yang', 'SC'), | ('SC),
| ('menghuni', 'VB'), | ('VB'),
| ('Antartika', 'NNP')] | ('NNP)]
Upvotes: 1
Views: 124
Reputation: 4482
You could achieve this with the following apply function:
data = [{'column_A': [('Ahli', 'NNP'),
('paleontologi', 'NNP'),
('Thomas', 'NNP'),
('dan', 'CC'),
('timnya', 'RB'),
('.', 'Z')]},
{'column_A': [('fosil', 'NN'),
('mamalia', 'NN'),
('yang', 'SC'),
('menghuni', 'VB'),
('Antartika', 'NNP')]}]
df = pd.DataFrame(data)
df['postag'] = df['column_A'].apply(lambda x : [y[1] for y in x])
df
Output:
column_A postag
0 [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN... [NNP, NNP, NNP, CC, RB, Z]
1 [(fosil, NN), (mamalia, NN), (yang, SC), (meng... [NN, NN, SC, VB, NNP]
Upvotes: 1
Reputation: 16327
Try using the apply function on the exiting column to get a new column with the desired result
Example Pseudocode:
df['postag'] = df['column_A'].apply(your_function)
In the your_function, write your logic for separating the pos tags from the list of tuples.
Upvotes: 1
Reputation: 1068
Adding to @Biranchi's answer, the correct answer would be
df['postag'] = df['column_A'].apply(lambda x: [(i[1],) for i in x])
Result would be
# print(df)
column_A postag
0 [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN... [(NNP,), (NNP,), (NNP,), ...
Upvotes: 2
Reputation: 71689
Use, Series.map
to apply a custom mapping function which maps each of the list in column_A
according to the desired requirements:
df['postag'] = df['column_A'].map(lambda l: [b for a, b in l])
Another possible idea:
df['postag'] = [[y for x, y in lst] for lst in df['column_A']]
Result:
# print(df)
column_A postag
0 [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN... [NNP, NNP, NNP, CC, RB, Z]
1 [(fosil, NN), (mamalia, NN), (yang, SC), (meng... [NN, NN, SC, VB, NNP]
Upvotes: 1