Reputation: 6809
I have a dataframe that has a weird format that I am having difficulty formatting it to a desired format. I just need the columns first_name
, last_name
, domain
, Email
, Verification
and status
but am not sure how to remove it when it is in this format.
,first_name,last_name,domain,Email,Score,Verification status,Source 1,Source 2,Source 3,Source 4,Source 5,Source 6,Source 7,Source 8,Source 9,Source 10,Source 11,Source 12,Source 13,Source 14,Source 15,Source 16,Source 17,Source 18,Source 19,Source 20,Source 21,Source 22,Source 23,Source 24,Source 25,Source 26,Source 27,Source 28,Source 29,Source 30
25 138,Ethan,Gach,kotaku.com,[email protected]...
79 535,Carla,Mozée,businessinsider.com,cmozee@bus...
114 814,Neil,Irwin,nytimes.com,neil.irwin@nytimes....
first_name last_name domain Email Score Verification status
0 Carla Mozée businessinsider.com [email protected] 99 valid
{',first_name,last_name,domain,Email,Score,Verification status,Source 1,Source 2,Source 3,Source 4,Source 5,Source 6,Source 7,Source 8,Source 9,Source 10,Source 11,Source 12,Source 13,Source 14,Source 15,Source 16,Source 17,Source 18,Source 19,Source 20,Source 21,Source 22,Source 23,Source 24,Source 25,Source 26,Source 27,Source 28,Source 29,Source 30': {25: '138,Ethan,Gach,kotaku.com,[email protected],99,valid,http://kotaku.com/new-mod-lets-you-play-game-boy-advance-games-in-hd-on-a-1830834944,http://compete.kotaku.com/the-weekend-in-esports-boston-counter-strike-party-1822479309,http://kotaku.com/madden-nfl-21-will-be-the-first-in-the-series-on-steam-1844050894,http://kotaku.com/gogs-mysterious-curation-process-rejected-opus-magnum-1821843050,http://kotaku.com/a-dramatic-podcast-fills-in-the-gaps-of-fallout-76s-bro-1842064555,http://kotaku.com/electronic-arts-says-its-customers-just-arent-that-inte-1836862047,http://compete.kotaku.com/the-weekend-in-esports-counter-strike-pokemon-and-su-1821353758,http://compete.kotaku.com/brazilians-prove-unstoppable-at-australian-counter-stri-1794995886,http://compete.kotaku.com/street-fighter-pro-goes-looking-for-an-energy-drink-mid-1818489146,http://dustinhawkfarmily.blogspot.com/2016/09,http://compete.kotaku.com/virtus-pro-seizes-top-spot-in-dota-2-pro-circuit-1823310035,http://deniseuncensored.blogspot.com/2016/05,http://destinyisbae.com/deaf-destiny-player-petitions-bungie-for-captioning,http://destinyisbae.com/destiny-player-destroys-latest-raid-boss-all-by-himself,http://kinja.com/ethangach/following,http://kinja.com/ethangach/followers,http://cosplay.kotaku.com/police-mistake-fallout-cosplay-for-bomb-1794353299,http://cosplay.kotaku.com/tracer-and-emily-cosplay-celebrates-romantic-overwatch-1790505557,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
79: '535,Carla,Mozée,businessinsider.com,[email protected],99,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
114: '814,Neil,Irwin,nytimes.com,[email protected],96,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
435: '5308,Alyssa,Therrien,dailyhive.com,[email protected],96,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
478: '6007,Bruce,Kamich,thestreet.com,[email protected],97,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
494: '6255,Zach,Brendza,triblive.com,[email protected],96,valid,http://triblive.com/news/all-stories,http://triblive.com/author/zach-brendza,http://triblive.com/aande/movies-tv/mighty-ducks-fly-together-for-reunion-at-nhl-game-secret-project,http://triblive.com/business/technology/gillette-ad-brings-mixed-response-400k-youtube-dislikes,http://triblive.com/aande/movies-tv/taylor-swift-confirms-cats-role-on-instagram,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
518: '6719,Bernhard,Warner,fortune.com,[email protected],96,valid,http://asherradio.com/disease-recession-war-games-why-none-of-it-seems-to-count-as-bad-news-for-todays-investors,http://mimicnews.com/could-hong-kong-tensions-sink-the-global-stocks-rally,http://mainstreetalerts.com/2020/01/31/the-coronavirus-is-already-disrupting-the-global-supply-chain-starting-with-these-commodities,http://mainstreetalerts.com/2020/03/16/global-markets-tumble-despite-the-feds-best-efforts-to-restore-confidence,http://mainstreetalerts.com/2020/03/17/global-markets-remain-choppy-after-mondays-historic-plunge,http://isreally.com/2020/02/21/looking-more-probable-analysts-up-the-odds-of-a-markets-correction-as-coronavirus-fears-surge,http://isreally.com/tag/bear-market,http://isreally.com/tag/bull-sheet,http://isreally.com/tag/nasdaq,http://isreally.com/tag/sp-500,http://fortunemediakit.com/editorial-bios,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
534: '6965,Clarence,Page,chicagotribune.com,[email protected],98,valid,http://chicagotribune.com/columns/clarence-page/ct-page-police-trump-speech-brutality-perspec-0802-jm-20170801-story.html,http://latimes.com:443/newsletters/politics/ct-page-portland-attack-hate-trump-perspec-0531-jm-20170530-story.html,http://chicagotribune.com/columns/clarence-page/ct-column-truth-fake-news-literacy-page-20201126-rpzogqv2fvdnnmzh6x4py7oblu-story.html,http://chicagotribune.com/columns/clarence-page/ct-column-trump-hawley-twitter-first-amendment-page-20210112-qdt6ehyk3fecnbha5lw4ginwa4-story.html,http://chicagotribune.com/columns/clarence-page/ct-perspec-page-caravan-donald-trump-judge-jon-tigar-1125-20181121-story.html,http://chicagotribune.com/news/columnists/chinews-clarence-page-20130507-staff.html,http://businessandcommunitypartners.blogspot.com/2015/08/i-dont-believe-arc-of-justice-bends-our.html,http://businessandcommunitypartners.blogspot.com/2015/08,http://arcamax.com/politics/fromtheleft/clarencepage/s-2361828-p2,http://arcamax.com/politics/fromtheleft/clarencepage/s-2368961,http://arcamax.com/politics/fromtheleft/clarencepage/s-2314283,http://arcamax.com/politics/fromtheleft/clarencepage/s-2357041,http://destee.com/threads/rachel-dolezals-naturalhair-politics.87634,http://arcamax.com/politics/fromtheleft/clarencepage/s-2302572,http://newsday.com/opinion/commentary/dems-lawsuit-not-as-nutty-as-it-sounds-1.18269124,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
558: '7730,Brenden,Rearick,investorplace.com,[email protected],95,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,',
592: '8639,Ian,Chee,lowyat.net,[email protected],98,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,'}}
,first_name,last_name,domain,Email,Score,Verification status,Source 1,Source 2,Source 3,Source 4,Source 5,Source 6,Source 7,Source 8,Source 9,Source 10,Source 11,Source 12,Source 13,Source 14,Source 15,Source 16,Source 17,Source 18,Source 19,Source 20,Source 21,Source 22,Source 23,Source 24,Source 25,Source 26,Source 27,Source 28,Source 29,Source 30
0,Jon,Levine,nypost.com,[email protected],87,accept_all,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Sujata,Rao,reuters.com,[email protected],94,accept_all,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Tony,Dokoupil,cbsnews.com,[email protected],87,accept_all,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Pippa,Stevens,cnbc.com,[email protected],94,unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,Clarissa-jan,Lim,buzzfeednews.com,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
6,Thomas,Franck,cnbc.com,[email protected],92,unknown,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
7,Matt,Phillips,nytimes.com,[email protected],97,valid,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
8,Jake,Johnson,commondreams.org,[email protected],90,accept_all,http://portside.org/node/19308/printable/print,http://beatnikmalcontent.blogspot.com/2019,http://beatnikmalcontent.blogspot.com/2019/02,http://beatnikmalcontent.blogspot.com/2019/02/socialism-is-more-popular-than-you.html,http://portside.org/2019-02-07/socialism-more-popular-you-think-mr-president-ocasio-cortez-says-trump-attack-shows,http://commondreams.org/key-staff,http://24ahead.com/u/johnsonjakep,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
11,Graig,Graziosi,independent.co.uk,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Upvotes: 1
Views: 1357
Reputation: 195468
You can use usecols=
:
df = pd.read_csv("your_data.txt", usecols=[1, 2, 3, 4, 6])
print(df)
Prints:
first_name last_name domain Email Verification status
0 Jon Levine nypost.com [email protected] accept_all
1 Sujata Rao reuters.com [email protected] accept_all
2 Tony Dokoupil cbsnews.com [email protected] accept_all
3 Pippa Stevens cnbc.com [email protected] unknown
4 Clarissa-jan Lim buzzfeednews.com NaN NaN
5 Thomas Franck cnbc.com [email protected] unknown
6 Matt Phillips nytimes.com [email protected] valid
7 Jake Johnson commondreams.org [email protected] accept_all
8 Graig Graziosi independent.co.uk NaN NaN
Upvotes: 2
Reputation: 9629
You can read the file with pandas.read_csv()
with error_bad_lines=False
:
import pandas as pd
df = pd.read_csv('filename.csv', error_bad_lines=False)
df = df[['first_name', 'last_name', 'domain', 'Email', 'Verification status']]
This will output:
first_name | last_name | domain | Verification status | ||
---|---|---|---|---|---|
0 | Jon | Levine | nypost.com | [email protected] | accept_all |
1 | Sujata | Rao | reuters.com | [email protected] | accept_all |
2 | Tony | Dokoupil | cbsnews.com | [email protected] | accept_all |
3 | Pippa | Stevens | cnbc.com | [email protected] | unknown |
4 | Clarissa-jan | Lim | buzzfeednews.com | nan | nan |
5 | Thomas | Franck | cnbc.com | [email protected] | unknown |
6 | Matt | Phillips | nytimes.com | [email protected] | valid |
7 | Graig | Graziosi | independent.co.uk | nan | nan |
You can save it with df.to_csv('filename.csv')
Upvotes: 1