Reputation: 712
I have a data that has location
, sentiment
and brand
fields. I want to calculate the number of positives, negatives and neutrals in a location for a brand.
Assuming x
has the data, I did:
a1 = GROUP x BY (location, brand);
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand), COUNT(x.sentiment=="positive"?1:0) AS positive_count, COUNT(x.sentiment=="negative"?1:0) AS negative_count, COUNT(x.sentiment=="neutral:?1:0) as neutral_count;
But I am getting a syntax error saying Unexpected character '"'
I tried grouping by all three: location, sentiment and brand
but I am getting only the overall count like:
{location: "newyork", brand: "pampers", sentiment = "positive", count = 10}
{location: "newyork", brand: "pampers", sentiment = "negative", count = 2}
{location: "newyork", brand: "pampers", sentiment = "neutral", count = 20}
I want seperate fields for positives_count, negatives_count and neutrals_count. Something like this:
{location: "newyork", brand: "pampers", positive_count = 10, negative_count = 2, neutral_count = 20}
{location: "london", brand: "pampers", positive_count = 12, negative_count = 0, neutral_count = 35}
{location: "newyork", brand: "huggies", positive_count = 40, negative_count = 6, neutral_count = 10}
Can some one help me out please?
Upvotes: 0
Views: 161
Reputation: 11090
Use single quotes
a1 = GROUP x BY (location, brand);
a2 = FOREACH a1 GENERATE FLATTEN(group) AS (location, brand),
COUNT(x.sentiment=='positive'?1:0) AS positive_count,
COUNT(x.sentiment=='negative'?1:0) AS negative_count,
COUNT(x.sentiment=='neutral'?1:0) as neutral_count;
EDIT
newyork pampers positive
newyork pampers positive
newyork pampers negative
newyork pampers positive
newyork pampers positive
newyork pampers neutral
newyork pampers positive
newyork pampers negative
newyork pampers neutral
newyork pampers positive
newyork pampers positive
newyork pampers neutral
Script
B = GROUP A BY (location,brand);
C = FOREACH B {
A1 = FILTER A BY sentiment matches 'positive';
A2 = FILTER A BY sentiment matches 'negative';
A3 = FILTER A BY sentiment matches 'neutral';
GENERATE FLATTEN(group) as (location,brand),COUNT(A1),COUNT(A2),COUNT(A3);
};
Output
Upvotes: 0
Reputation: 712
I filtered the alias that contained the original data and counted each number of entries and joined them all.
p = FILTER y BY (sentiment == 'positive');
p1 = GROUP p BY (location, brand, avl_author_type);
p2 = FOREACH p1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(p) AS positive_counts;
n = FILTER y BY (sentiment == 'negative');
n1 = GROUP n BY (location, brand, avl_author_type);
n2 = FOREACH n1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(n) AS negative_counts;
ne = FILTER y BY (sentiment == 'neutral');
ne1 = GROUP ne BY (location, brand, avl_author_type);
ne2 = FOREACH ne1 GENERATE FLATTEN(group) AS (location, brand, avl_author_type), COUNT(ne) AS neutral_counts;
j1 = JOIN p2 BY (location, brand, avl_author_type) LEFT OUTER, n2 BY (location, brand, avl_author_type);
j2 = FOREACH j1 GENERATE p2::location as location, p2::brand as brand, p2::avl_author_type as avl_author_type, p2::positive_counts as positive_counts, n2::negative_counts as negative_counts;
j3 = JOIN j2 BY (location, brand, avl_author_type) LEFT OUTER, ne2 BY (location, brand, avl_author_type);
j4 = FOREACH j3 GENERATE j2::location as location, j2::brand as brand, j2::avl_author_type as avl_author_type, j2::positive_counts as positive, j2::negative_counts as negative, ne2::neutral_counts as neutral;
Kind of lengthy but worked.
Upvotes: 0