Reputation: 3768
I have a text file:
John|Hopkins|||31
Sage|Jen|42
And I want to read it into python and split by ‘|’
So I want something like:
[['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]
file = open('mytxt.txt', 'r')
file_2 = file.readlines()
lst=[]
for line in file_2:
line=line.strip('\n')
line=line.split('|')
lst.append(line)
print(lst)
I’m getting:
[['John', 'Hopkins', '', '', '31'], ['Sage', 'Gen', '42']]
As seen, there are ''
present in the first list due to consecutive ||
.
How do I modify the split statement to cater for single | and consecutive |||?
Upvotes: 1
Views: 148
Reputation: 191738
Use regex to capture one or more
import re
with open(‘mytxt.txt’) as f:
for line in f:
print(re.split(r'\|+', line.rstrip()))
Upvotes: 3
Reputation: 2596
with open('mytext.txt') as f:
lst = [
[word for word in line.rstrip().split('|') if word]
for line in f
]
print(lst)
import re
with open('mytext.txt') as f:
lst = [
re.split(r'\|+', line.rstrip())
for line in f
]
print(lst)
outputs are same:
[['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]
Upvotes: 0
Reputation: 1848
Reading the file as a dataframe in and fetch the rows as a list would also do the work.
df = pandas.read_csv("mytxt.txt",sep = "|")
rowlist = []
for index,rows in df.iterrows():
one_row = [rows[column] for column in df.columns] #get the data as list
one_row = list(filter(lambda x: str(x) != 'nan', one_row)) #remove null values
rowlist.append(one_row) #append to main list
Output =
[['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]
Upvotes: 0
Reputation: 676
Add this below your line=line.split(‘|’)
line
line = [word for word in line if word!='']
Upvotes: 2
Reputation: 59228
You may try filtering out empty values:
line = filter(None, s.split("|"))
Upvotes: 1