Siyah
Siyah

Reputation: 2897

How to get rid of empty strings in Python when splitting a list?

I have an input file which consists of these lines:

['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n', # and so on....]

I have formatted it with readlines, to this:

['Some Name', '', '', '', '2.0 2.0 1.3\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
['Another Name', '', '', '', '1.0 9.0 1.0\n']
# and so on

What I wanted to do, is to get the names beneath each other, while I am getting rid of the _ signs.

This is my code:

def openFile():
    fileFolder = open('TEXTFILE', 'r')
    readMyFile = fileFolder.readlines()

    for line in readFile:
        line = line.split("_")

        personNames = line[0]

        print personNames

print openFile()

So what I get now, is:

Some Name
Another Name
Another Name

That is cool, but I want to go further and that is where I am getting stuck. What I want to do now, is to get rid of the empty strings ("") and print the numbers you can see, just beside the names I've already formatted.

I thought that I could just do this:

for line in readFile:
    line = line.split("_")
    get_rid_of_spaces = line.split() #getting rid of spaces too

    personNames = line[0]

But this gives me this error:

AttributeError: 'list' object has no attribute 'split'

How can I do this? I want to learn this.

I also tried incrementing the index number, but this failed and I read it's not the best way to do this, so now I am going this way.

Beside that, I'd expect that when I'd do line[1], that it would give me the empty strings, but it doesn't.

What am I missing here?

Upvotes: 2

Views: 3514

Answers (6)

Batman
Batman

Reputation: 8927

Use a list comprehension to remove the empty strings.

for line in read_file:
     tokens = [x for x in line.split("_") if x != ""]
     person_name = tokens[0]

Upvotes: 2

Juan Diego Godoy Robles
Juan Diego Godoy Robles

Reputation: 14955

Just use re split to get advantage of a multiple char delimiter:

>>> import re
>>> 
>>> line = 'Some Name__________2.0 2.0 1.3\n'
>>> re.split(r'_+', line)
['Some Name', '2.0 2.0 1.3\n']

Example in a for loop:

>>> lines = ['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> for dat in [re.split(r'_+|\n', line) for line in lines]:
...    person = dat[0]
...    id = dat[1]
...    print person, id
... 
Some Name 2.0 2.0 1.3
Some Name 1.0 9.0 1.

Upvotes: 4

Alvaro
Alvaro

Reputation: 12037

The output of str.split is a list

list doesn't have a split method, that's why you get that error.

You can instead do:

with open('yourfile') as f:
    for line in f:
         split = line.split('_')
         name, number = split[0], split[-1]
         print '{}-{}'.format(number, name)

Several things to note:

1) Don't use camel case

2) Use context managers for files, aka the with statement, it handles file status nicely if something fails

3) Pay attention to this line: for line in f:. It has the benefit of iterating through each line, never having the whole file in memory

Upvotes: 1

abacles
abacles

Reputation: 859

readfile=['Some name____2.0 2.1 1.3','Some other name_____2.2 3.4 1.1']

data=[]
for line in readfile:
    first_split=list(part for part in line.split('_') if part!='')
    data.append(list([first_split [0],first_split [1].split(' ')]))

print(data)

I think this does what you wanted if I understood you correctly. It prints out:

[['Some name', ['2.0', '2.1', '1.3']], ['Some other name', ['2.2', '3.4', '1.1']]]

Upvotes: 0

Ahasanul Haque
Ahasanul Haque

Reputation: 11134

>>> a =['Some Name__________2.0 2.0 1.3\n', 'Some Name__________1.0 9.0 1.0\n']
>>> import re
>>> [re.search(r'_+(.+)$', i.rstrip()).group(1) for i in a]
['2.0 2.0 1.3', '1.0 9.0 1.0']

Upvotes: 1

Francisco
Francisco

Reputation: 11486

You could do something like this:

for line in readFile:
    line = line.split("_")
    line = filter(bool, line)

This will remove all the empty string in the line list.

Upvotes: 1

Related Questions