Reputation: 43
In my assessment I'm asked to write a shell script using only bash commands and another shell script using only SQL queries. These scripts should do the following: 1. Clean data in the .csv file (not important at the moment) 2. Sum up earnings based upon gender 3. Produce a simple HTML table
I have made the SQL query produce the correct numbers and HTML file, but with som help from other bash commands. For the file that should only contain bash commands I'm able to get the table but one of the numbers are wrong.
I'm very new to bash scripting and SQL queries so the code isn't very optimised.
The following is a shortned version of the sample input: CSV input
title,site,country,year_release,box_office,director,number_of_subjects,subject,type_of_subject,race_known,subject_race,person_of_color,subject_sex,lead_actor_actress
10 Rillington Place,http://www.imdb.com/title/tt0066730/,UK,1971,-,Richard Fleischer,1,John Christie,Criminal,Unknown,,0,Male,Richard Attenborough
12 Years a Slave,http://www.imdb.com/title/tt2024544/,US/UK,2013,56700000,Steve McQueen,1, Solomon Northup,Other,Known,African American,1,Male,Chiwetel Ejiofor
127 Hours,http://www.imdb.com/title/tt1542344/,US/UK,2010,18300000,Danny Boyle,1,Aron Ralston,Athlete,Unknown,,0,Male,James Franco
1987,http://www.imdb.com/title/tt2833074/,Canada,2014,-,Ricardo Trogi,1,Ricardo Trogi,Other,Known,White,0,Male,Jean-Carl Boucher
20 Dates,http://www.imdb.com/title/tt0138987/,US,1998,537000,Myles Berkowitz,1,Myles Berkowitz,Other,Unknown,,0,Male,Myles Berkowitz
21,http://www.imdb.com/title/tt0478087/,US,2008,81200000,Robert Luketic,1,Jeff Ma,Other,Known,Asian American,1,Male,Jim Sturgess
24 Hour Party People,http://www.imdb.com/title/tt0274309/,UK,2002,1130000,Michael Winterbottom,1,Tony Wilson,Musician,Known,White,0,Male,Steve Coogan
42,http://www.imdb.com/title/tt0453562/,US,2013,95000000,Brian Helgeland,1,Jackie Robinson,Athlete,Known,African American,1,Male,Chadwick Boseman
8 Seconds,http://www.imdb.com/title/tt0109021/,US,1994,19600000,John G. Avildsen,1,Lane Frost,Athlete,Unknown,,0,Male,Luke Perry
84 Charing Cross Road,http://www.imdb.com/title/tt0090570/,US/UK,1987,1080000,David Hugh Jones,2,Frank Doel,Author,Unknown,,0,Male,Anthony Hopkins
84 Charing Cross Road,http://www.imdb.com/title/tt0090570/,US/UK,1987,1080000,David Hugh Jones,2,Helene Hanff,Author,Unknown,,0,Female,Anne Bancroft
A Beautiful Mind,http://www.imdb.com/title/tt0268978/,US,2001,171000000,Ron Howard,1,John Nash,Academic,Unknown,,0,Male,Russell Crowe
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Carl Gustav Jung,Academic,Known,White,0,Male,Michael Fassbender
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Sigmund Freud,Academic,Known,White,0,Male,Viggo Mortensen
A Dangerous Method,http://www.imdb.com/title/tt1571222/,Canada/UK,2011,5700000,David Cronenberg,3,Sabina Spielrein,Academic,Known,White,0,Female,Keira Knightley
A Home of Our Own,http://www.imdb.com/title/tt0107130/,US,1993,1700000,Tony Bill,1,Frances Lacey,Other,Unknown,,0,Female,Kathy Bates
A Man Called Peter,http://www.imdb.com/title/tt0048337/,US,1955,-,Henry Koster,1,Peter Marshall,Other,Known,White,0,Male,Richard Todd
A Man for All Seasons,http://www.imdb.com/title/tt0060665/,UK,1966,-,Fred Zinnemann,1,Thomas More,Historical,Known,White,0,Male,Paul Scofield
A Matador's Mistress,http://www.imdb.com/title/tt0491046/,US/UK,2008,-,Menno Meyjes,2,Lupe Sino,Actress ,Known,Hispanic (White),0,Female,PenÌÎå©lope Cruz
For the SQL queries only file this is my code so far (produces right numbers and correct table):
python3 csv2sqlite.py --table-name test_table --input table.csv --output table.sqlite
echo -e '<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>' >> tmp1.txt
sqlite3 biopics.sqlite 'SELECT subject_sex,SUM(earnings) FROM table \
GROUP BY subject_sex;' -html > tmp2.txt
cat tmp2.txt >> tmp1.txt
echo '</TABLE>' >> tmp1.txt
cp tmp1.txt $1
cat $1
rm tmp1.txt tmp2.txt
For the bash only file this is my code so far:
echo -e '<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>' >> tmp1.txt
awk -F ',' '{for (i=1;i<=NF;i++)
if ($1)
a[$13] += $5} END{for (i in a) printf("<TR><TD> %s </TD><TD> %i </TD></TR>\n", i, a[i])}' table.csv | sort | head -2 > tmp2.txt
cat tmp2.txt >> tmp1.txt
echo -e "</TABLE>" >> tmp1.txt
cp tmp1.txt $1
cat $1
rm tmp1.txt tmp2.txt
The expected output should look like this:
<TABLE BORDER = "1">
<TR><TH>Gender</TH>
<TH>Total Amount [$]</TH>
</TR>
<TR><TD>Female</TD>
<TD>8480000.0</TD>
</TR>
<TR><TD>Male</TD>
<TD>455947000.0</TD>
</TR>
</TABLE>
Thank you in advance!
Upvotes: 2
Views: 86
Reputation: 6769
#! /bin/bash
awk -F, '{
if (NR != 1)
{
if (sum[$13] == "")
{
sum[$13]=0
}
sum[$13]+=$5
}
}
END {
print "<TABLE BORDER = \"1\">"
print "<TR><TH>Gender</TH><TH>Total Amount [$]</TH></TR>"
for ( gender in sum )
{
print "<TR><TD>"gender"</TD>", "<TD>"sum[gender]"</TD></TR>"
}
print "</TABLE>"
}' table.csv
Here try this if it works for you.
UPDATE:
What I understand from your comment is that you want to sort data as per the sum.
#! /bin/bash
awk -F, -v OFS=, '{
if (NR != 1)
{
if (sum[$13] == "")
{
sum[$13]=0
}
sum[$13]+=$5
}
}
END {
for ( gender in sum )
{
print gender, sum[gender]
}
}' table.csv | sort -nk 2,2 |
awk -v firstline="$(sed -n '1p' table.csv)" '{
printrow($0)
}
BEGIN {
split(firstline, headers, ",")
print "<html>"
print "<TABLE BORDER = "1">"
printrow(headers[5]","headers[13], 1)
}
END {
print "</table>"
print "</html>"
}
function printrow(row, flag)
{
# if flag == 0 or null "<TD>" else "<TH>"
len = split(row, cells, ",")
print "<TR>"
for (i = 1 ; i <= len ; ++i)
{
if (!flag)
print "<TD>"cells[i]"</TD>"
else
print "<TH>"cells[i]"</TH>"
}
print "</TR>"
}'
Above, I have basically divided what you need into 2 modules,
1) Just organises the table
2) Sorts data as per the 2nd column. This one I should have had done in the first awk
script itself but it was a little shorter this way.
The second awk
script receives output from the first one.
It sets the headings and tags.
I feel its more modular this way. This just makes it easier to make modifications. First script for data manipulation and second for placing headers or tags.
What I would personally like is giving the second awk
script its own executable file. Now simply using first script for data manipulation and then passing it to another script for setting html tags and headers.
There might be better alternatives, I suggested the best I knew.
Upvotes: 1