Maxxx
Maxxx

Reputation: 3768

Splitting by consecutive delimiter in text file

I have a text file:

John|Hopkins|||31
Sage|Jen|42

And I want to read it into python and split by ‘|’

So I want something like:

[['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]

file = open('mytxt.txt', 'r')
file_2 = file.readlines()

lst=[]
for line in file_2:
    line=line.strip('\n')
    line=line.split('|')
    lst.append(line)
print(lst)

I’m getting:

[['John', 'Hopkins', '', '', '31'], ['Sage', 'Gen', '42']]

As seen, there are '' present in the first list due to consecutive ||.

How do I modify the split statement to cater for single | and consecutive |||?

Upvotes: 1

Views: 148

Answers (5)

OneCricketeer
OneCricketeer

Reputation: 191738

Use regex to capture one or more

import re

with open(‘mytxt.txt’) as f:
    for line in f:
        print(re.split(r'\|+', line.rstrip()))

Upvotes: 3

Boseong Choi
Boseong Choi

Reputation: 2596

Here is full codes:

  • filtering with list comprehension
with open('mytext.txt') as f:
    lst = [
        [word for word in line.rstrip().split('|') if word]
        for line in f
    ]
print(lst)
  • split with regex
import re

with open('mytext.txt') as f:
    lst = [
        re.split(r'\|+', line.rstrip())
        for line in f
    ]
print(lst)

outputs are same:

[['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]

Upvotes: 0

Shahir Ansari
Shahir Ansari

Reputation: 1848

Reading the file as a dataframe in and fetch the rows as a list would also do the work.

df = pandas.read_csv("mytxt.txt",sep = "|")
rowlist = []
for index,rows in df.iterrows():
    one_row = [rows[column] for column in df.columns]            #get the data as list
    one_row = list(filter(lambda x: str(x) != 'nan', one_row))   #remove null values
    rowlist.append(one_row)                                      #append to main list

Output = [['John', 'Hopkins', '31'], ['Sage', 'Jen', '42']]

Upvotes: 0

Himanshu
Himanshu

Reputation: 676

Add this below your line=line.split(‘|’) line

line = [word for word in line if word!='']

Upvotes: 2

Selcuk
Selcuk

Reputation: 59228

You may try filtering out empty values:

line = filter(None, s.split("|"))

Upvotes: 1

Related Questions