Reputation: 4630
I have file .txt that contains a list of words like this:
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
I would like to transform such data into a pandas table that I expect to show into a html/bootstrap template as follows (*):
COL_A COL_B
5.91686268506 exclusively, catering, provides, arms, georgia, formal, purchase, choose
5.91560417296 hugh, senlis
5.91527936181 italians
5.91470429433 soil, cultivation, fertile
5.91468087491 increases, moderation
....
5.91440227412 farmers, descendants
So I tried the following with pandas:
import pandas as pd
df = pd.read_csv('file.csv',
sep = ' ', names=['Col_A', 'Col_B'])
df.head(20)
However, my table doesnt have the above desired sructure:
COL_A COL_B
6.281426 engaged, chance, makes, meeting, nations, things, believe, tries, believing, knocked, admits, awkward
6.277438 sweden NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.271190 artificial, ammonium NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.259790 boats, prefix NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6.230612 targets, tactical, wing, missile, squadrons NaN NaN NaN NaN NaN NaN NaN
Any idea of how to get the data as the (*) tabular format?
Upvotes: 2
Views: 65
Reputation: 214927
Because you have space between words and if you specify space as delimiter, it will naturally separate them. To get what you need, you can try to set the sep
as a regular expression (?<!,)
, ?<!
is a negative look behind syntax, which means separate on space only when it is not preceded by a comma and it should work for your case:
pd.read_csv("~/test.csv", sep = "(?<!,) ", names=['weight', 'topics'])
# weight topics
#0 5.916863 exclusively, catering, provides, arms, georgia...
#1 5.915604 hugh, senlis
#2 5.915279 italians
#3 5.914704 soil, cultivation, fertile
#4 5.914681 increases, moderation
#5 5.914402 farmers, descendants
Upvotes: 6