riorio
riorio

Reputation: 6836

possible use cases for kafka / storm / spark in a B2C web site

My company has a B2C web site that serves several thousands of web users a day.

It uses PHP, angular.js and MySQL.

We like to step up into new technological domains and we saw that some of the hot trends are kafka / spark / storm.

How can we use these technologies in our architecture and how can we benefit from them?

Thanks

Upvotes: 0

Views: 211

Answers (2)

miguno
miguno

Reputation: 15087

There are some common motivations for migrating to the new kind of stack that you describe above, such as:

  • Decoupling: For example, application A of team 1 should not impact application B of team 2. You should be able to develop and deploy the two applications independently ("A/team 1 does not need to wait for B/team 2, and vice versa"), and e.g. a surge of load on A should likewise not cause collateral damage for B. Event sourcing, CQRS, micro-services etc. are concepts that all help with decoupling. (For a brainteaser on "decoupling" in the wider sense, I'd recommend to watch Simple Made Easy by Rich Hickey, the creator of Clojure and co-founder of Datomic).
  • Scalability, elasticity, fault-tolerance, being "reactive" (this is related to decoupling): For example, you may need more than a single machine (oftentimes dozens) to process the incoming data of your application or to serve requests to clients. Your applications should also respond dynamically to increased/decreased capacity demands, which is one of the ideas behind the Reactive Manifesto (cf. the recent whitepaper of Lightbend, formerly TypeSafe, the company behind Scala and Akka).

Whether or not it makes sense in your situation to migrate away from what you already have, is of course up to you to evaluate and decide. For example, maybe your current setup already covers these benefits well enough for your needs, maybe not.

But if you'd like to go in this direction, here are some further pointers to get you started:

  • The Log: What every software engineer should know about real-time data's unifying abstraction, by Jay Kreps, one of the creators of Apache Kafka

  • Event Sourcing, micro-services. To give you some concrete examples, you can read up on retailers such as Walmart (blog post) but also large financial companies such as Capital One (slides/talk from StrangeLoop 2016) adopting these concepts. If you like to read a higher-level overview of all that, you may want to take a look at Event sourcing, CQRS, stream processing, and Apache Kafka: What's the connection?. A key idea is that, via event sourcing, you are essentially exploiting the benefits of immutability on the architecture level (another Rich Hickey talk, The Value of Values, explains why immutability/values are so important). This means you have an immutable "accounting ledger" of all the events that have ever happened in your application (e.g. "customer ABC bought item XYZ at time T "), which you can exploit for things such as re-processing historical data (e.g. to fix a bug you discovered in production), A/B testing (on the same set of historical data), and so on.

  • Another good article IMHO is The Data Dichotomy: Rethinking the Way We Treat Data and Services, which ties back into the aforementioned topics and also explains -- at a concept level -- why nowadays many engineers re-design their architectures with technologies such as Kafka or Spark. As the author writes, one motivation here is the idea to "scale" in people terms because, unlike computers, our human brain does not double its capacity every 18 months.

If you'd like to learn about more about this topic, I'd recommend Martin Kleppmann's short and free ebook Making Sense of Stream Processing: How can event streams help make your application more scalable, reliable, and maintainable (IIRC it's about 60 pages). Martin is also writing a longer book Designing Data Intensive Applications, which IIRC is scheduled for publishing in its final form in March 2017; you can already access the current almost-complete draft via O'Reilly Early Access.

Upvotes: 1

Darshan
Darshan

Reputation: 2333

Technologies that you are using, form a web application stack. Whereas technologies like kafka, spark, storm server different purpose altogether. So I'll explain what each one of these used for, and how can they help you.

Kafka Is a distributed streaming platform. In layman's terms, it is just a queueing mechanism. If your application has some kind of backend process that runs on a cluster. So, your PHP backend will send request data to backend process.

Apache Spark Well, it is mainly used for large-scale data processing. It also provides you streaming feature(mini-batch streaming), Graph APIs, ML APIs. When you want to process huge amount of data in cluster, your should consider this.

Apache Storm This is a distributed realtime computation system. According to my understanding, this provides you better-streaming capabilities for realtime processing of data.

In summary, all these technologies are meant for distributed processing with realtime processing capabilities. If you want to incorporate any of these systems, your PHP backend might act as a middleman that uses these systems on behalf of end user. Also, you might want to have multiple instances of your PHP backend to not to make it a bottleneck.

Upvotes: 0

Related Questions