Reputation: 796
Say I have text file like below:
apple pear banana, peach orange grape
dog cat white horse
salmon
tiger lion eagle hawk monkey
Looking for output like:
"apple", "pear", "banana", "peach orange grape"
"dog", "cat", "white horse"
"salmon"
"tiger", "lion", "eagle hawk", "monkey"
Two problems,
How can I load them into a pandas dataframe?
In fact I am wondering if it is possible to complete this without reading line by line, because I initial solution is:
read each line, use REX
re.split(r'\s{2,}', line)
to split by double space
however, coz the column number is random, I can't simply generate a DF by that. Adding names=[] in pd.read_csv() will handle uneven columns, but this requires pre define column names and number.
Any suggestion?
Thank you!
Upvotes: 1
Views: 2546
Reputation: 43169
To provide another example in addition to the one provided by @JD Long, you could use a regular expression plus a list comprehension:
import re, pandas as pd
string = """
apple pear banana peach orange grape
dog cat white horse
salmon
tiger lion eagle hawk monkey
"""
rx = re.compile(r'''[ ]{2,}''')
items = [(rx.split(line)) for line in string.split("\n") if line]
df = pd.DataFrame.from_records(items)
print(df)
... which yields:
0 1 2 3
0 apple pear banana peach orange grape
1 dog cat white horse None
2 salmon None None None
3 tiger lion eagle hawk monkey
Upvotes: 1
Reputation: 60746
read_table()
is your friend here...
df = pd.read_table('./test.txt', sep=" ", header=None)
df.to_csv('outfile.csv')
any missing columns will be filled with NaN.
The text file test.txt above is the following:
test 2 1
t
t2 1
t3
t4 3 4
and df looks like this:
0 1 2
0 test 2.0 1.0
1 t NaN NaN
2 t2 1.0 NaN
3 t3 NaN NaN
4 t4 3.0 4.0
To get the quoting in the output you may need to use the quoting
option from csv:
import csv
import pandas as pd
df = pd.read_table('./test.txt', sep=" ", header=None)
df.to_csv(quoting=csv.QUOTE_NONNUMERIC)
Upvotes: 2