Reputation: 5351
I have a file like this:
100,2015-09-01
100,2015-09-02
105,2015-09-01
645,2017-01-01
It's just a number and a date separated by a comma. It has 20000 lines.
I want to transform this file into a spreadsheet (Google, Excel, LibreOffice, any format, even a CSV) that says for each number if it appears for a given date. For example:
ID | 2015-09-01 | 2015-09-02 | 2017-01-01
----+------------+------------+-----------
100 | YES | YES | NO
105 | YES | NO | NO
645 | NO | NO | YES
How can I do that?
I'm on Linux, so command line utilities like sed
, awk
, etc are available and preferred as a way to automate this.
Upvotes: 2
Views: 64
Reputation: 12668
The CSV equivalent to the data you post (to be read by excel) is:
ID,2015-09-01,2015-09-02,2017-01-01
100,YES,YES,NO
105,YES,NO,NO
645,NO,NO,YES
as you can easily test if you create that spreadsheet and export it to CSV format.
So, to be able to read it as CSV, you have first to convert your data from the format you posted to the format I post above, and then import it as CSV. This can be done with the AWK script that has been posted here also by @ThiagoNegri, so I'm not going to repeat it again.
Upvotes: 0
Reputation: 5351
Could make it work using AWK.
awk -F "," '{
numbers[$1]
dates[$2]
number_date[$1,$2]
} END {
for (number in numbers) printf ";%s", number;
printf "\n"
for (date in dates) {
printf "%s", date
for (number in numbers)
if ((number SUBSEP date) in number_date) printf ";YES"
else printf ";NO"
printf "\n"
}
}' input.csv | sort > output.csv
Upvotes: 1