Joe
Joe

Reputation: 4234

Ramdajs, group array with arguments

List to group:

const arr = [
  {
    "Global Id": "1231",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 55,
  },
  {
    "Global Id": "5433",
    "TypeID": "FD1",
    "Size": 160,
    "Flöde": 100,
  },
  {
    "Global Id": "50433",
    "TypeID": "FD1",
    "Size": 120,
    "Flöde": 100,
  },
 {
    "Global Id": "452",
    "TypeID": "FD2",
    "Size": 120,
    "Flöde": 100,
  },
]

Input to function which specifies what keys to group:

const columns = [
    {
      "dataField": "TypeID",
      "summarize": false,
    },
    {
      "dataField": "Size",
      "summarize": false,
    },
    {
      "dataField": "Flöde",
      "summarize": true,
    },
]

Expected output:

const output = [
    {
      "TypeID": "FD1",
      "Size": 160,
      "Flöde": 155 // 55 + 100
      "nrOfItems": 2
    },
    {
       "TypeID": "FD1",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    },
    {
       "TypeID": "FD2",
       "Size": 120,
       "Flöde": 100,
       "nrOfItems": 1  
    }
  ]

  // nrOfItems adds up 4. 2 + 1 +1. The totalt nr of items.

Function:

const groupArr = (columns) => R.pipe(...);

The "summarize" property tells if the property should summarize or not.

The dataset is very large, +100k items. So I don't want to iterate more than necessary.

I've looked at the R.group but I'm not sure it can be applied here?

Maybe something with R.reduce? Store the group in the accumulator, summarize values and add to count if the group already exists? Need to find the group fast so maybe store the group as a key?

Or is it better to use vanilla javascript in this case?

Upvotes: 1

Views: 84

Answers (2)

Scott Sauyet
Scott Sauyet

Reputation: 50797

Here's my initial approach. Everything but summarize is a helper function which I suppose could be inlined if you really wanted. I find it cleaner with this separation.

const getKeys = (val) => pipe (
  filter (propEq ('summarize', val) ),
  pluck ('dataField')
) 

const keyMaker = (columns, keys = getKeys (false) (columns)) => pipe (
  pick (keys),
  JSON .stringify
)

const makeReducer = (
  columns,
  toSum = getKeys (true) (columns),
  toInclude = getKeys (false) (columns),
) => (a, b) => ({
  ...mergeAll (map (k => ({ [k]: b[k] }), toInclude ) ),
  ...mergeAll (map (k => ({ [k]: (a[k] || 0) + b[k] }), toSum ) ),
  nrOfItems: (a .nrOfItems || 0) + 1
})

const summarize = (columns) => pipe (
  groupBy (keyMaker (columns) ),
  values,
  map (reduce (makeReducer (columns), {} ))
)

const arr = [{"Flöde": 55, "Global Id": "1231", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "5433", "Size": 160, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "50433", "Size": 120, "TypeID": "FD1"}, {"Flöde": 100, "Global Id": "452", "Size": 120, "TypeID": "FD2"}]
const columns = [{"dataField": "TypeID", "summarize": false}, {"dataField": "Size", "summarize": false}, {"dataField": "Flöde", "summarize": true}]

console .log (
  summarize (columns) (arr)
)
<script src="https://bundle.run/[email protected]"></script><script>
const {pipe, filter, propEq, pluck, pick, mergeAll, map, groupBy, values, reduce} = ramda</script>

There is a lot of overlap with the solution from Joe, but also some real differences. His was already posted when I saw the question, but I wanted my own approach not to be influenced, so I didn't look until I wrote the above. Note the difference in our hash functions. Mine does JSON.stringify on values like {TypeID: "FD1", Size: 160} while Joe's creates "GROUPKEY___FD1___160". I think I like mine better for the simplicity. On the other hand, Joe's solution is definitely better than mine in handling nrOfItems. I updated it on each reduce iteration and have to use an || 0 to handle the initial case. Joe simply starts the fold with the already-known value. But overall, the solutions are quite similar.

You mention wanting to reduce the number of passes through the data. The way I write Ramda code tends not to help with this. This code iterates the whole list to group it into like items then iterates through each of those groups to fold down to individual values. (Also there is a perhaps a minor iteration in values.) These could certainly be changed to combine those two iterations. It might even make for shorter code. But to my mind, it would become harder to understand.

Update

I was curious about the single-pass approach, and found that I could use all the infrastructure I built for the multi-pass one, rewriting only the main function:

const summarize2 = (columns) => (
  arr,
  makeKey = keyMaker (columns),
  reducer = makeReducer (columns)
) => values (reduce (
  (a, item, key = makeKey (item) ) => assoc (key, reducer (key in a ? a[key]: {}, item), a),
  {},
  arr
))

console .log (
  summarize2 (columns) (arr)
)

I wouldn't choose this over the original unless testing showed that this code was a bottleneck in my application. But it's not as much more complex as I thought it would be, and it does everything in one iteration (well, except for whatever values does.) Interestingly, it makes me change my mind a bit about the handling of nrOfItems. My helper code just worked in this version, and I never had to know the total size of the group. That wouldn't have happened if I used Joe's approach.

Upvotes: 2

user3297291
user3297291

Reputation: 23372

Here's an answer in vanilla javascipt first, because I'm not super familiar with the Ramda API. I'm pretty sure the approach is the quite similar with Ramda.

The code has comments explaining every step. I'll try to follow up with a rewrite to Ramda.

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];

// The columns that don't summarize
// give us the keys we need to group on
const groupKeys = columns
  .filter(c => c.summarize === false)
  .map(g => g.dataField);

// We compose a hash function that create
// a hash out of all the items' properties
// that are in our groupKeys
const groupHash = groupKeys
  .map(k => x => x[k])
  .reduce(
    (f, g) => x => `${f(x)}___${g(x)}`,
    () => "GROUPKEY"
  );

// The columns that summarize tell us which
// properties to sum for the items within the
// same group
const sumKeys = columns
  .filter(c => c.summarize === true)
  .map(c => c.dataField);
  
// Again, we compose in to a single function.
// This function concats two items, taking the
// "last" item with only applying the sum
// logic for keys in concatKeys
const concats = sumKeys
  .reduce(
    (f, k) => (a, b) => Object.assign(f(a, b), {
      [k]: (a[k] || 0) + b[k]
    }),
    (a, b) => Object.assign({}, a, b)
  )

// Now, we take our data and group by the groupHash
const groups = arr.reduce(
  (groups, x) => {
    const k = groupHash(x);
    if (!groups[k]) groups[k] = [x];
    else groups[k].push(x);
    return groups;
  },
  {}
);

// These are the keys we want our final objects to have...
const allKeys = ["nrTotal"]
  .concat(groupKeys)
  .concat(sumKeys);
  
// ...baked in to a helper to remove other keys
const cleanKeys = obj => Object.assign(
  ...allKeys.map(k => ({ [k]: obj[k] }))
);

// With the items neatly grouped, we can reduce each
// group using the composed concatenator
const items = Object
  .values(groups)
  .flatMap(
    xs => cleanKeys(
      xs.reduce(concats, { nrTotal: xs.length })
    ),
  );

console.log(items);

Here's an attempt at porting to Ramda, but I didn't get much further than replacing the vanilla js methods with the Ramda equivalents. Curious to see which cool utilities and functional concepts I missed! I'm sure somebody more knowledgable on the Ramda specifics will chime in!

const arr=[{"Global Id":"1231",TypeID:"FD1",Size:160,"Flöde":55},{"Global Id":"5433",TypeID:"FD1",Size:160,"Flöde":100},{"Global Id":"50433",TypeID:"FD1",Size:120,"Flöde":100},{"Global Id":"452",TypeID:"FD2",Size:120,"Flöde":100}],columns=[{dataField:"TypeID",summarize:!1},{dataField:"Size",summarize:!1},{dataField:"Flöde",summarize:!0}];


const [ sumCols, groupCols ] = R.partition(
  R.prop("summarize"), 
  columns
);

const groupKeys = R.pluck("dataField", groupCols);
const sumKeys = R.pluck("dataField", sumCols);

const grouper = R.reduce(
  (f, g) => x => `${f(x)}___${g(x)}`,
  R.always("GROUPKEY"),
  R.map(R.prop, groupKeys)
);

const reducer = R.reduce(
  (f, k) => (a, b) => R.mergeRight(
    f(a, b),
    { [k]: (a[k] || 0) + b[k] }
  ),
  R.mergeRight,
  sumKeys
);

const allowedKeys = new Set(
  [ "nrTotal" ].concat(sumKeys).concat(groupKeys)
);

const cleanKeys = R.pipe(
  R.toPairs,
  R.filter(([k, v]) => allowedKeys.has(k)),
  R.fromPairs
);

const items = R.flatten(
  R.values(
    R.map(
      xs => cleanKeys(
        R.reduce(
          reducer,
          { nrTotal: xs.length },
          xs
        )
      ),
      R.groupBy(grouper, arr)
    )
  )
);

console.log(items);
<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.26.1/ramda.min.js"></script>

Upvotes: 2

Related Questions