Reputation: 190
I'm working with about 24k text files and am splitting some lines on '-'. It works for some files, however it fails to split for some other files.
company_participants
is a list with N >= 1
elements, with each element consisting of a name followed by a hyphen ("-"), followed by the job title. To get the names, I use:
names_participants = [name.split('-')[0].strip() for name in company_participants]
On closer inspection, I found that it does not recognise "-" as "-" for some reason.
For example, the first element in company_participants
is "robert isom - president"
Calling company_participants[0].split()[2]
returns "-" since I've split on whitespace, and the hyphen is the third element (index 2).
When I then run a boolean on whether this is equal to "-", I get False.
company_participants[0].split()[2] == "-" # Item at index 2 is the hyphen
# Output = False
Any idea what's going on here? Is there something else that looks like a hyphen but isn't one?
Many thanks!
Upvotes: 0
Views: 88
Reputation: 190
So I found that this has actually been answered elsewhere on StackOverflow.
Apparently I'm dealing with a "dash" and not a "hyphen"; couldn't see the difference with me naked eyes but when I copied the symbol from here, then it recognised it such that company_participants[0].split()[2] == "–"
returned True.
#textDataProblems
#didNotSeeThatComing
Thank you StackOverflow!
Upvotes: 2