Reputation: 525
I have a comma-separated csv file with three columns:
"Date", "URL", "Views"
and I am trying to extract certain rows that contain specific keywords in the column URL
, like the word charger
for example.
import pandas as pd
keywords = {"charger"}
df = pd.read_csv("original_file.csv", sep=",")
listMatchURL = []
for i in range(len(df.index)):
if any(x in df['URL'][i] for x in keywords):
listMatchURL.append(df['URL'][i])
output = pd.DataFrame({'URL': listMatchURL})
output.to_csv("new_file.csv", index=False)
This writes in a new csv file the entire URL row that contains the keyword. But how can I extract and write only the keyword, instead of the entire URL? I don't want to extract the entire http://www.example.com/search/iphone+charger.html
but simply charger
.
And also, how can I keep the two other corresponding columns Date
and Views
in the new csv file I'm writing? For now, it extracts only the URL
column.
I'm looking to get a new csv file that has the columns:
"Date", "Keyword", "Views"
Upvotes: 0
Views: 1429
Reputation: 46759
As an alternative, this could be done without Pandas as follows:
import csv
keywords = {"charger"}
with open('original_file.csv', newline='') as f_input, open('new_file.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
csv_output.writerow(['Date', 'Keyword', 'Views'])
for date, url, views in csv_input:
for keyword in keywords:
if keyword in url:
csv_output.writerow([date, keyword, views])
break # Remove if multiple keywords per url are allowed
Upvotes: 1