Pandas: ignore new lines as separators in read_csv

Question

I have an input string that has delimiter $$$Field$$$. The string has some lines. I need return a list of all the items in the string, separated by $$$Field$$$ only.

In the example below I should receive as output ['Food', 'Fried Chicken', 'Banana']. However, seems that it is interpreting the new lines as a separator as well, so instead of a list I am getting a table. How can I ignore those new lines, so that I just get a list back?

import pandas as pd
from pandas.compat import StringIO

temp=u"""Food$$$Field$$$Fried
Chicken$$$Field$$$Banana"""
df = pd.read_csv(StringIO(temp), sep='\$\$\$Field\$\$\$',engine='python')
print (df)

The only reason why I am using pandas is because this string is actually a huge .csv file, and I cannot read all this in memory at a time, but a streaming processing would be acceptable.

victorlin · Accepted Answer

Since you are not looking to store your information in a tabular format, I don't think a DataFrame is necessary. Instead, read your string in chunks and yield the buffer every time it encounters '$$$Field$$$'.

Adapted from https://stackoverflow.com/a/16260159/4410590:

def myreadlines(f, newline):
    buf = ""
    while True:
        while newline in buf:
            pos = buf.index(newline)
            yield buf[:pos]
            buf = buf[pos + len(newline):]
        chunk = f.read(4096)
        if not chunk:
            yield buf
            break
        buf += chunk

Then call the function:

> for x in myreadlines(StringIO(temp), '$$$Field$$$'):
      print repr(x)

u'Food'
u'Fried
Chicken'
u'Banana'

Pandas: ignore new lines as separators in read_csv

Answers (2)

Related Questions