ViSa
ViSa

Reputation: 2247

Unable to convert text into dataframe in python

I am trying to convert a text into a dataframe using Python.

sample_text: 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'

Final Desired output:

enter image description here

Steps I am following to achieve the above output are listed below :

  1. Break the text into multiple rows and assign it to a variable: I have tried using print() to process this text formatted_text = print('This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34') but it cant be assigned as print() returns NoneType, so I get an error here.

Desired output after this step:

This is 
sample text

Name|age
--|--
1.abc|45
2.xyz|34
  1. Use the above line break text stored in a variable to be read as a CSV with the separator | to create a dataframe: I have been thinking of processing this as pd.read_csv(formatted_text,sep='|', skipinitialspace=True)

Desired_output after this step:

enter image description here

I tried earlier explaining this problem in SO post but I guess I wasn't able to explain it well and it got closed. I Hope I am able to explain my issue this time. It could be a silly task but I have been stuck at this for a long time now and would appreciate any help.

Upvotes: 0

Views: 94

Answers (3)

PaulS
PaulS

Reputation: 25448

A possible solution:

text = 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'

pd.read_csv(StringIO(text), lineterminator='\n', engine='c', header=None)

Output:

             0
0     This is 
1  sample text
2     Name|age
3        --|--
4     1.abc|45
5     2.xyz|34

To split the columns, we can use str.split after read_csv:

(pd.read_csv(StringIO(text), lineterminator='\n', engine='c', header=None)[0]
 .str.split('|', expand=True))

Output:

             0     1
0     This is   None
1  sample text  None
2         Name   age
3           --    --
4        1.abc    45
5        2.xyz    34

Upvotes: 1

m-sarabi
m-sarabi

Reputation: 2305

We can split each line by | into a list and create a dataframe from that:

import pandas as pd

text = 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'

array = [line.split('|') if '|' in line else [line, ''] for line in text.splitlines()]

df = pd.DataFrame(array)
print(df)

output:

             0    1
0     This is      
1  sample text     
2                  
3         Name  age
4           --   --
5        1.abc   45
6        2.xyz   34

Upvotes: 1

mozway
mozway

Reputation: 262224

You can just split the rows, then the columns and feed this to the DataFrame constructor, optionally fillna with '':

s = 'This is \nsample text\n\nName|age\n--|--\n1.abc|45\n2.xyz|34'
df = pd.DataFrame([x.split('|') for x in s.split('\n')]).fillna('')

Alternatively, using a Series and str.split:

df = pd.Series(s.splitlines()).str.split('|', expand=True).fillna('')

Output:

             0    1
0     This is      
1  sample text     
2                  
3         Name  age
4           --   --
5        1.abc   45
6        2.xyz   34

Upvotes: 2

Related Questions