Reputation: 396
As a beginner in Python What I'm trying to achieve sounds very easy but I'm unable to get python to work as desired.
I have a csv file with several headers as such:
Area Facility
AAA car, train, bus
BBB car
CCC car, bus, tram
DDD bicycle
EEE car, bus, train, tram, walk
FFF train, tram, plane, helicopter
I am trying to split the 'Facility' column into the different words and then run some queries (e.g. unique facilities). My desired output is train, tram, plane, walk etc as a list from column 2.
I am able to successfully split the csv into the two columns but if I further iterate it breaks it down into single letters.
import csv
fOpen1=open('C:\data.csv')
Facilities=csv.reader(fOpen1)
unique=[]
for row in Facilities:
for facility in row[1]:
if row[13] not in unique:
unique.append(row[13])
I looked around and noticed people using split.lines but had no luck using it either.
Any suggestion/ideas?
Thank you!
Upvotes: 0
Views: 1192
Reputation: 1711
As csv file split columns with ,
, if there is no ,
between the first column and the second column, the output for each line will be like this:
['Area Facility']
['AAA car', ' train', ' bus']
['BBB car']
['CCC car', ' bus', ' tram']
['DDD bicycle']
['EEE car', ' bus', ' train', ' tram', ' walk']
['FFF train', ' tram', ' plane', ' helicopter']
Thus, you can use split
of the the first element of the list
to get the first facility. The other facilities is stored in the rest of the list
. Your target can be achieved as follows:
import csv
fOpen1=open('C:\data.csv')
Facilities=csv.reader(fOpen1)
unique=[]
for row in Facilities:
first_facility = row[0].split()[1] # by default, use space to split
if first_facility not in unique:
unique.append(first_facility)
for rest_facility in row[1:]:
if rest_facility not in unique:
unique.append(rest_facility)
print unique
Upvotes: 0
Reputation: 36545
Here is the documentation for split
Docstring: S.split(sep=None, maxsplit=-1) -> list of strings
Return a list of the words in S, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done. If sep is not specified or is None, any whitespace string is a separator and empty strings are removed from the result.
Basically if you call split
with no argument, it splits on whitespace (the columns in your dataset), you can split on any other character by calling split with that character, e.g.
print("car, train, bus".split(','))
['car', ' train', ' bus']
Upvotes: 2