Building solutions together

Big data brings with it many advantages, including the ability to discover otherwise unidentifiable trends and to process data in nearly real-time.

At TekWissen we make sure that we provide you with a solution that captures all of these benefits of Cloud Computing and Big Data, whilst still effectively addressing any technical challenges using the best tools available.

  • • Implementation of horizontally scalable solutions from the initial load-balancing, web and business-logic servers to data storage, transformation and analysis
  • • Usage of different types of NoSQL storages like Cassandra, HBase, Riak, Redis, MongoDB and others
  • • Real-time processing based on data streaming frameworks like Google Dremel, Storm, HStreaming, Percolator
  • • Set-up of Map-Reduce processes based on Hadoop, Cascading, Hive, Pig, Cascalog and other systems
  • • Usage of advanced data analytics platforms like Vertica, Pentaho and Pivotal(GreenPlum)

Architectural design for distributed scalable systems

Its no easy task building large-scale distributed platforms and it requires a deep understanding of the challenges of increased data volume, velocity and variety. Proper use of existing and evolving software frameworks and products will mitigate the risks but still requires knowledge of the Big Data landscape and the pros/cons of each concrete solution.

TekWissen employs seasoned architects who will help to gather requirements, pick appropriate solutions and design a system that efficiently meets clients needs.

Development and quality assurance of big data solutions

TekWissen has a proven track record in successful delivery of distributed scalable solutions and has accumulated a substantial amount of knowledge and expertise in this area.

Our engineers and QA experts are well aware of the challenges Big Data poses and are up to the task of building and testing your product the right way.

Operations of big data solutions

TekWissen has successfully shipped projects that run both on in-house infrastructure, Amazon Elastic Cloud, Rackspace and/or other cloud storage providers. Our DevOps, Systems Engineering and Network Operations Control teams offer the following services:

  • • Infrastructure setup: system hardware requirements, network topology and host configurations are deduced to meet current and future needs of production systems
  • • Deployment automation: all necessary steps to roll out new releases are automated to ensure smooth and continuous delivery
  • • Backup and recovery: disaster recovery scenarios are defined and backup servers are prepared
  • • Operation readiness: failure notifications are set up and system monitoring is established

Big data platforms

The main Big Data management platforms that we embrace at TekWissen are:

NoSQL storages

NoSQL term describes a wide family of data storage products that employ less constrained consistency models than traditional relational databases. NoSQL solutions are used either alongside or instead of RDBMS to improve a systems data throughput, achieve linear scalability and effectively store unstructured data. The TekWissen team actively utilizes the following NoSQL storages: MongoDB, HBase, Cassandra, Riak, Redis, Infinispan as well as others.

Horizontally-scalable offline batch processing

Web-scale data often exceeds storage and memory capabilities of a single machine, while a variety of data creates difficulties when it comes to persisting the data using traditional approaches.

Apache Hadoop is the industry-standard for the implementation of Map-Reduce pattern, which is typically used for offline batch processing tasks. Hadoop provides virtually unlimited scale and schema-free storage and makes sure the data is redundantly distributed across a cluster of machines. We also use tools like Cascading, Hive, Pig and Cascalog alongside Hadoop to optimize Big Data processing tasks.

Real-time processing, data streaming frameworks

Batch processing frameworks like Hadoop are good when there is a need to go through the entire dataset. When it comes to real-time processing of new data chunks and ad-hoc analysis, another family of technologies emerge, which we actively research and embrace at TekWissen. Percolator is used as a Map-Reduce successor designed for incrementally processing updates to a large data set and to create the Google web search index.

Google Dremel/Apache Drill are tools which allow analysts to scan over petabytes of data in seconds to answer ad hoc queries and, presumably, power compelling visualizations. Pregel is a large bulk synchronous processing application for petabyte-scale graph processing on distributed commodity machines.

Advanced analytics and business intelligence tools

Business Intelligence (BI) is an umbrella term that includes applications, infrastructures, tools and best practices that enable access to and analysis of information to improve and optimize decision and performance.

Tools like Tableau, QlikView and Pentaho provide elaborate toolsets to search for hidden patterns, meaningful correlations and trends within massive volumes of data.