Reputation: 7644
i have a table in pandas df
bigram frequency
(123,3245) 2
(676,35346) 84
(93,32) 9
and so on, till 50 rows.
what i am looking for is, split the bigram column into two different columns removing the brackets and comma like,
col1 col2 frequency
123 3245 2
676 35346 84
93 32 9
is there any way to split if after comma,and removing brackets.
Upvotes: 3
Views: 1488
Reputation: 294488
very close to @Psidom's answer.
I use pd.DataFrame(df.bigram.values.tolist(), columns=['c1', 'c2'])
instead of df.bigram.apply(lambda x: pd.Series(x, index=['col1', 'col2']))
pd.concat([pd.DataFrame(df.bigram.values.tolist(), columns=['c1', 'c2']),
df.drop('bigram', 1)],
axis=1)
Upvotes: 2
Reputation: 36691
Try creating a new column for each pair of the tuple.
df['col1'] = df['bigram'].apply(lambda x: x[0])
df['col2'] = df['bigram'].apply(lambda x: x[1])
To create a data frame with ONLY col1
, col2
, and frequency
where the order is important, it is easier to create a new data frame altogether and populate it.
df_new = pd.DataFrame()
df_new['col1'] = df['bigram'].apply(lambda x: x[0])
df_new['col2'] = df['bigram'].apply(lambda x: x[1])
df_new['frequency'] = df['frequency']
Upvotes: 2
Reputation: 215057
If your bigram
column happens to be string format, you can use .str.extract()
method with regex to extract numbers from it:
pd.concat([df.bigram.str.extract('(?P<col1>\d+),(?P<col2>\d+)'), df.frequency], axis = 1)
Or if the bigram
column is of tuple type:
Method1: use pd.Series to create columns from the tuple:
pd.concat([df.bigram.apply(lambda x: pd.Series(x, index=['col1', 'col2'])),
df.frequency], axis=1)
Method2: use .str
to get the first and second element from the tuple
df['col1'], df['col2'] = df.bigram.str[0], df.bigram.str[1]
df = df.drop('bigram', axis=1)
Upvotes: 3