Artificial Intelligence steadily makes its way into all possible industries, with providers such as Mriyar already providing AI-powered platforms for hassle-free integration with third-party marketplaces. When the client turned to us, the project infrastructure has long passed the MVP stage, but it ultimately needed configuration of resource utilization tools (integration of the monitoring deployment playbook via Ansible) and migration from AWS to the private cloud environment in order to save the infrastructure budget and keep up with resources scaling.
The project infrastructure was initially planned and structured by the client’s development team so we required a good share of narrow-profiled expertise in order to identify project bottlenecks as well as optimize and improve the underlying processes before sending the whole product into production stages. The project was quite wide-scope from the get-go as it was intended to process extensive volumes of data. And this is exactly why it became difficult for the client to manage the cloud infrastructure budget and monitor resource utilization without additional assistance.
|Client||Location||Project goal||Team||Project Timeframe|
BigData(ML)-based automotive startup for B2C and B2B market niches
Configure thorough monitoring of major high-load project services; create functional dashboards with a big number of system and custom business metrics; Describe the configurations in Ansible playbooks and integrate the system with the main project playbook.Compose a plan of stage-by-stage project migration to dedicated capacities to minimize downtime; conduct full-on migration and test the performance of all services.Compose a plan of flexible project scaling and migration to Kubernetes.
|2 Lead DevOps Engineers
||Consulting services have been provided since March 2020|
Stack of tasks:
- Installation of Prometheus monitoring and deployment of Prometheus operators in Docker;
- Deployment of monitoring and Grafan using Ansible with following configuration and deployment of the alert manager;
- Integration with the general Ansible playbook;
- Addition of rules to custom metrics for services such as PostgreSQL, PgBouncer, Kafka, ClickHouse, ScyllaDB, HaProxy, etc.
- Preparation and demonstration of the project migration plan;
- Composition of a Terraform script for infrastructure provisioning and deployment of environments (development and production ones);
- Preparation of Ansible roles for main high-load project services;
- Moving of databases via the backup-restore approach;
- Connection of S3 bucket for static data;
- Configuration of the secrets storage, role matrix, and firewall;
- Switching of DNS records and adjustment of the CloudFlare service.
With this project, we’ve had a thorough pleasure of working with a client that has clear goals and poses transparent requirements.
- Plan out and conduct smooth project migration;
- Adjust business-efficient system monitoring;
- Provide convenient remote client-side communication with the dev team;
- Cut the project infrastructure budget as much as possible;
- Grant reliable tech support;
- Implement project optimization tasks based on the major scaling plan.
At the first two stages of the project – monitoring and migration – we were working with the existing infrastructure solutions and client-defined technologies. The third stage – migration to Kubernetes – required the dev team to get rid of any serverless solutions, focusing heavily on managing resource utilization. What we also did was declaratively configuring clusters for project environments and designing the new project infrastructure.
How we did it:
- First off, we configured Prometheus+Grafana services on the basic level and approved a number of the desired metrics to be included in the dashboard. We also described the logic via Ansible roles which we integrated with the general playbook;
- Optimized the performance of a centralized logging system – ELK – in order to lower resource consumption rates;
- Analyzed infrastructure metrics and added new custom indicators to the dashboard to be used by the client’s team;
- Prepared automations for new dedicated infrastructure deployment;
- Prepared Ansible playbooks for necessary services and finalized the project migration plan;
- Moved high-load databases and re-initiated the project deployment flow;
- Switched domains and launched the production stage;
- Currently, we are handling preparations for migrating the whole project to Kubernetes.
DevOps tech stack
- Ansible, Terraform;
- Apache Kafka, ZooKeeper;
- Python (Flask, aiohttp, Django); Vue; NLP; ML/AI.
- NoSQL (ClickHouse, ScyllaDB);
- Prometheus+Grafana, Graylog, Sentry.
As a result of our long-term efforts, we were able to adjust the detailed monitoring of over 35 servers and 7 services, which enabled us to also outline a clear infrastructure and underlying processes optimization plan. The infrastructure budget is projected to be optimized approximately by 40%. On top of that, we formed a scope of tasks implying further optimization of the system for the rational use of resources by all services running in terms of the project.
Do you have a complex running infrastructure that requires minor or major optimizations and changes? Contact us if you are interested in achieving similar results by employing the field expertise of some of the most seasoned IT professionals out there.
Dmitry has 5 years of professional IT experience developing numerous consumer & enterprise applications. Dmitry has also implemented infrastructure and process improvement projects for businesses of various sizes. Due to his broad experience, Dmitry quickly understands business needs and improves processes by using established DevOps tools supported by Agile practices. The areas of Dmitry’s expertise are extensive, namely: version control, cloud platform automation, virtualization, Atlassian JIRA, software development lifecycle, Confluence, Slack, Service Desk, Flowdock, Bitbucket, and CI/CD.