Saloni Agrawal
Saloni Agrawal

Reputation: 57

DataFrame to String

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO
import pandas as pd
TESTDATA = StringIO(txt)
df = pd.read_csv(TESTDATA,names=['col1'])

where

txt="The lion (Panthera leo) is a species in the family Felidae;it is a muscular, deep-chested cat with a short, rounded head, a reduced neck and round ears, and a hairy tuft at the end of its tail. The lion is sexually dimorphic; males are larger than females with a typical weight range of 150 to 250 kg (330 to 550 lb) for males and 120 to 182 kg (265 to 400 lb) for females. "

When I run the above code I get output as:

The lion (Panthera leo) is a species in the family Felidae;it is a muscular deep-chested cat with a short   rounded head    a reduced neck and round ears   and a hairy tuft at the end of its tail

I get 4 different columns with last column labeled as col1. But What I want is single column with full data. How to achieve it? I want to convert txt data to dataframe with single column.

Upvotes: 1

Views: 297

Answers (2)

prosti
prosti

Reputation: 46401

But What I want is single column with full data. How to achieve it? I want to convert txt data to dataframe with single column.

from io import StringIO
import pandas as pd    
txt="The lion (Panthera leo) is a species in the family Felidae;it is a muscular, deep-chested cat with a short, rounded head, a reduced neck and round ears, and a hairy tuft at the end of its tail. The lion is sexually dimorphic; males are larger than females with a typical weight range of 150 to 250 kg (330 to 550 lb) for males and 120 to 182 kg (265 to 400 lb) for females. "


memory_file=StringIO(txt)
df =pd.read_csv(memory_file, sep=r'\n', header=None, engine='python', names=["cname"])
print(df)
print(df.size)

Upvotes: 0

Zubda
Zubda

Reputation: 963

When you are reading the data using pd.read_csv the default delimiter is a comma ,, you need to explicitly pass sep=';' to pd.read_csv(TESTDATA, sep=';') if you want to split it by different delimiters or use a delimiter that is not in the file to ignore all the delimiters like sep='###'

Upvotes: 1

Related Questions