Reputation: 583
I have found a .txt
file with the names of more than 5000 cities around the world. The link is here. The text within is all messy. I would like, in Python, to read the file and store it into a list, so I could search the name of a city whenever I want?
I tried loading it as a dataframe with
import pandas as pd
cities = pd.read_csv('cities15000.txt',error_bad_lines=False)
However, everything looks very messy. Is there an easier way to achieve this? Thanks in advance!
Upvotes: 0
Views: 273
Reputation: 425
The linked file is like a CSV (Comma Separated Values) but instead of commas it uses tabs as the field separator. Set the sep
parameter of the pd.read_csv
function to \t
, i.e. the tab character.
In [18]: import pandas as pd
...:
...: pd.read_csv('cities15000.txt', sep = '\t', header = None)
Out[18]:
0 1 2 3 4 5 ... 13 14 15 16 17 18
0 3040051 les Escaldes les Escaldes Ehskal'des-Ehndzhordani,Escaldes,Escaldes-Engo... 42.50729 1.53414 ... NaN 15853 NaN 1033 Europe/Andorra 2008-10-15
1 3041563 Andorra la Vella Andorra la Vella ALV,Ando-la-Vyey,Andora,Andora la Vela,Andora ... 42.50779 1.52109 ... NaN 20430 NaN 1037 Europe/Andorra 2020-03-03
2 290594 Umm Al Quwain City Umm Al Quwain City Oumm al Qaiwain,Oumm al Qaïwaïn,Um al Kawain,U... 25.56473 55.55517 ... NaN 62747 NaN 2 Asia/Dubai 2019-10-24
3 291074 Ras Al Khaimah City Ras Al Khaimah City Julfa,Khaimah,RAK City,RKT,Ra's al Khaymah,Ra'... 25.78953 55.94320 ... NaN 351943 NaN 2 Asia/Dubai 2019-09-09
4 291580 Zayed City Zayed City Bid' Zayed,Bid’ Zayed,Madinat Za'id,Madinat Za... 23.65416 53.70522 ... NaN 63482 NaN 124 Asia/Dubai 2019-10-24
... ... ... ... ... ... ... ... ... ... .. ... ... ...
24563 894701 Bulawayo Bulawayo BUQ,Bulavajas,Bulavajo,Bulavejo,Bulawayo,bu la... -20.15000 28.58333 ... NaN 699385 NaN 1348 Africa/Harare 2019-09-05
24564 895061 Bindura Bindura Bindura,Bindura Town,Kimberley Reefs,Биндура -17.30192 31.33056 ... NaN 37423 NaN 1118 Africa/Harare 2010-08-03
24565 895269 Beitbridge Beitbridge Bajtbridz,Bajtbridzh,Beitbridge,Beitbridzas,Be... -22.21667 30.00000 ... NaN 26459 NaN 461 Africa/Harare 2013-03-12
24566 1085510 Epworth Epworth Epworth -17.89000 31.14750 ... NaN 123250 NaN 1508 Africa/Harare 2012-01-19
24567 1106542 Chitungwiza Chitungwiza Chitungviza,Chitungwiza,Chytungviza,Citungviza... -18.01274 31.07555 ... NaN 340360 NaN 1435 Africa/Harare 2019-09-05
[24568 rows x 19 columns]
Upvotes: 1