Import text file with uneven column number and complicated delimiter

Question

Say I have text file like below:

apple  pear  banana,  peach orange grape

dog  cat  white horse

salmon

tiger  lion  eagle hawk  monkey

Looking for output like:

"apple", "pear", "banana", "peach orange grape"

"dog", "cat", "white horse"

"salmon"

"tiger", "lion", "eagle hawk", "monkey"

Two problems,

each row I only want separate them by double space ' '
column number of each row could be random, from 1 to 100

How can I load them into a pandas dataframe?

In fact I am wondering if it is possible to complete this without reading line by line, because I initial solution is:

read each line, use REX

re.split(r'\s{2,}', line)

to split by double space

after split by double space, insert each row into DF

however, coz the column number is random, I can't simply generate a DF by that. Adding names=[] in pd.read_csv() will handle uneven columns, but this requires pre define column names and number.

Any suggestion?

Thank you!

Jan · Accepted Answer

To provide another example in addition to the one provided by @JD Long, you could use a regular expression plus a list comprehension:

import re, pandas as pd

string = """
apple  pear  banana  peach orange grape

dog  cat  white horse

salmon

tiger  lion  eagle hawk  monkey
"""

rx = re.compile(r'''[ ]{2,}''')

items = [(rx.split(line)) for line in string.split("
") if line]

df = pd.DataFrame.from_records(items)
print(df)

... which yields:

        0     1            2                   3
0   apple  pear       banana  peach orange grape
1     dog   cat  white horse                None
2  salmon  None         None                None
3   tiger  lion   eagle hawk              monkey

Import text file with uneven column number and complicated delimiter

Answers (2)

Related Questions