Reputation: 1466
I have a file which contains the following unstructured data. It's the results from a Google Trends output and there are about four or five set of "tables" that are stacked on each other in one single spreadsheet.
['2015-10-25', '100']
['2015-10-26', '88']
['2015-10-27', '82']
['2015-10-28', '72']
['2015-10-29', '68']
['2015-10-30', '73']
['2015-10-31', '85']
['2015-11-01', '98']
['2015-11-02', ' ']
['2015-11-03', ' ']
['2015-11-04', ' ']
[]
[]
['Top subregions for nespresso']
['Subregion', 'nespresso']
['New York', '100']
['Massachusetts', '83']
['California', '83']
['New Jersey', '80']
['Washington', '77']
['Florida', '72']
['Maryland', '64']
['District of Columbia', '63']
['Colorado', '61']
What I'm trying to do is select just those rows which contain date strings, which is always the first table (few headers above it). Here's what I have at the moment. Of course, it doesn't work as it returns an empty data list.
with open('GT_Trends_Daily.csv', 'rt') as csvfile:
csvReader = csv.reader(csvfile)
data = []
for row in csvReader:
dat = [s for row in csvReader if "2015" in s]
data.append(dat)
for i in data:
print i
I have a solution for this in R, but I'd love to switch over to Python one of these days and so I've been digging into how I could solve this.
Upvotes: 1
Views: 47
Reputation: 150178
This list comprehension is probably not doing what you want (in fact, it's a NameError
unless you've defined s
before):
dat = [s for row in csvReader if "2015" in s]
You can populate data
using a list comprehension like this:
data = [row for row in csvReader if row and row[0].startswith("2015")]
Upvotes: 1
Reputation: 5515
i think you want this:
for row in csvReader:
if any('2015' in s for s in row): data.append(row)
unless you only want to append the date then:
for row in csvReader:
dat = [s for s in row if '2015' in s]
if dat: data.append(dat)
your main problem was your list comprehension was really wack and iterating through each row in csvReader
when thats what the for loop does.
Upvotes: 1
Reputation: 2093
Your list comprehension is wrong, try
dat = [s for s in row if "2015" in s]
Upvotes: 1