Reputation: 57
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
TESTDATA = StringIO(txt)
df = pd.read_csv(TESTDATA,names=['col1'])
where
txt="The lion (Panthera leo) is a species in the family Felidae;it is a muscular, deep-chested cat with a short, rounded head, a reduced neck and round ears, and a hairy tuft at the end of its tail. The lion is sexually dimorphic; males are larger than females with a typical weight range of 150 to 250 kg (330 to 550 lb) for males and 120 to 182 kg (265 to 400 lb) for females. "
When I run the above code I get output as:
The lion (Panthera leo) is a species in the family Felidae;it is a muscular deep-chested cat with a short rounded head a reduced neck and round ears and a hairy tuft at the end of its tail
I get 4 different columns with last column labeled as col1. But What I want is single column with full data. How to achieve it? I want to convert txt data to dataframe with single column.
Upvotes: 1
Views: 297
Reputation: 46401
But What I want is single column with full data. How to achieve it? I want to convert txt data to dataframe with single column.
from io import StringIO
import pandas as pd
txt="The lion (Panthera leo) is a species in the family Felidae;it is a muscular, deep-chested cat with a short, rounded head, a reduced neck and round ears, and a hairy tuft at the end of its tail. The lion is sexually dimorphic; males are larger than females with a typical weight range of 150 to 250 kg (330 to 550 lb) for males and 120 to 182 kg (265 to 400 lb) for females. "
memory_file=StringIO(txt)
df =pd.read_csv(memory_file, sep=r'\n', header=None, engine='python', names=["cname"])
print(df)
print(df.size)
Upvotes: 0
Reputation: 963
When you are reading the data using pd.read_csv
the default delimiter is a comma ,
, you need to explicitly pass sep=';'
to pd.read_csv(TESTDATA, sep=';')
if you want to split it by different delimiters or use a delimiter that is not in the file to ignore all the delimiters like sep='###'
Upvotes: 1