DevOps for Data Science by bridging the gap between development and data pipelines

Abstract: The evolving technology landscape requires the convergence of DevOps and Data Science, which has become a pivotal force by combining innovation and efficiency to empower organizations with data-driven insights. While traditionally related to software development, DevOps has increased its influence on data science, creating a mutual relationship that bridges the gap between data analysis and development. This article explores the huge significance of implementing DevOps practices in the data science industry, thus addressing challenges and displaying the transformative benefits for organizations aiming to utilize the full potential of their data assets. The collaboration between DevOps and Data Science may initially seem like an unlikely pairing, particularly given their distinct focuses on software development and data analysis. However, this merger holds a huge promise for organizations looking to maximize the value extracted from their data. This paper delves into the DevOps for Data Science concept, depicting how this collaboration accelerates decision-making processes by promoting faster, more reliable, and insightful outcomes. The intersection of DevOps and Data Science regarding data-driven decision-making is important for business success. This article explores how DevOps integrates into data science with its core principles of collaboration, automation, and continuous improvement by addressing challenges related to the traditional division between development and data analytics. DevOps practices revolutionize how organizations extract from theory data, mainly reshaping the decision-making approach. The article emphasizes the practical application of DevOps in data science and its role in transforming the reliability and efficiency of overall development. DevOps is slowly gaining recognition as a strong solution for breaking down traditional barriers between operations and developers in contemporary organizations. By emphasizing efficient teamwork and automation, the article highlights how DevOps accelerates delivery speed, promoting overall organizational performance and providing a competitive edge in the market.

Keywords: DevOps, Data Science, integration, machine learning models, automation, data analytics, and continuous integration

1. Introduction

Integrating DevOps in data science shows a huge shift in how organizations utilize data's power to enhance business value. Siloed operations are from the past, especially as DevOps practices systematically dismantle barriers that traditionally separated development and data analytics [1]. This collaboration allows a seamless flow of information, allowing organizations to derive faster, more precise insights that directly affect decision-making processes.

DevOps acts as a block that brings together the different strengths of development and data science teams, thus promoting collaboration beyond the limitations of conventional structures [2]. The result is the merging of skills and expertise that expedites the pace of innovation and pushes organizations toward success. As modern organizations strove to extract helpful insights from different data points, the collaboration between DevOps and data science is the solution for purposeful and dynamic partnerships. This shift positions organizations on the lead of industry advancements.

2. Significance and Background of DevOps for Data Science

In traditional development environments, merging data science and conventional development processes brings different challenges. The innate disconnect between the objectives, methodologies, and tools applied by data scientists and their counterparts in traditional development teams poses some obstacles. While data scientists are immersed in data exploration, model building, and insight extraction, traditional development teams focus on delivering software products. The differences often result in the slow deployment of data-driven solutions and hinder collaborative efforts between these two domains.

The unique methodologies applied in data science, such as machine learning, statistical analysis, and data exploration, may not seamlessly align with the established processes of traditional software development [3]. This misalignment depicts the critical need for bridging the gap between data science and development for organizations aspiring to harness data-driven potency for decision-making and secure a competitive scope in today's data-centric scope. Addressing this disparity is key. DevOps principles and practices are a solution to surmount these challenges and enhance the efficiency of data science projects.

Embracing DevOps methodologies allows data science teams to seamlessly integrate into the development pipeline, thus aligning their workflows with the development and operations complexities. Collaboration, automation, and continuous integration set organizations up for success by empowering data scientists to deliver helpful insights and predictive models quickly and accurately. The strategic integration of DevOps principles bridges the historical gap, thus promoting a coordinated coexistence between data science and development for organizations poised to thrive in an era driven by data [4].

3. DevOps Principles for Data Science

Implementing DevOps principles in data science is a strategic shift that combines collaborative practices and efficient methodologies. This approach aims to streamline the processes of data processing, modeling, and deployment, which effectively minimizes bottlenecks in deployment and ensures a high degree of consistency across the entire data pipeline. The implementation of DevOps for data science is interchangeable with accelerated time-to-insights [5], increased reliability through automated testing, and the cultivation of cross-functional collaboration. Despite the challenge of developing from the different nature of data and the requirement for specialized tools, there are compelling advantages of achieving faster and more accurate insights. Here are the core principles and how they can be applied to the data science landscape;

3.1. Collaboration

Collaboration fosters an environment where cross-functional teams seamlessly unite to achieve common goals. This collaboration is key for software development, which mirrors the need within the data science domain; the integration of these collaborative principles between data science, development, and domain expertise is advantageous and is key for the success of modern organizations. In data science, bringing together different perspectives of data scientists, developers, and domain experts is key [6]. It forms an alliance where every team's unique skills and insights contribute to a more comprehensive understanding of business objectives. Their expertise in data exploration and statistical analysis allows data scientists to collaborate closely with developers who excel in building robust and scalable systems. Including domain experts ensures that the data-driven insights derived from the analysis are accurate and aligned with the organization's main goals and vision. The collaboration ensures that data science projects are not pursued in isolation but align with the organization's strategic objectives.

3.2. Version control and reproducibility

Version control systematically tracks code, data, and model changes throughout the development and analysis processes. The tracking serves two purposes: it preserves a detailed historical record of the evolution of the data science project, thus allowing teams to revisit and understand the progression of analyses over time. Secondly, it ensures the reproducibility of analyses by providing an idea of the exact state of code and data at any given point in the project's timeline[7]. In data science, where analyses often include complex data manipulations, fragile modeling techniques, and different algorithms, version control becomes a safeguard against uncertainty and ambiguity.

Using well-established tools like Git, which has been key in software development, and platforms like DVC, particularly tailored for managing the versioning of data science projects, ensures that the version control process seamlessly aligns with the unique needs of data scientists. By adopting these version control tools, data science teams ensure the reproducibility of their analyses and cultivate an environment conducive to collaborative workflows. Every iteration, improvement, or adjustment to the models or code becomes a trackable and transparent event. This transparency facilitates better collaboration within the data science teams and enables cross-functional cooperation with development and operation teams with the DevOps framework.

3.3. Automation

Data processing, model training, and deployment are key components of the analytical journey in data science for efficiency and accuracy. By automating these repetitive tasks, data scientists can liberate valuable time and cognitive resources from mundane activities, thus redirecting their focus toward higher-order thinking and more strategic aspects of their analyses. One of the key advantages of automation in data science is the mitigation of manual errors. When tackled manually, repetitive tasks are prone to human errors that can greatly affect the accuracy of analyses.

Automation becomes a safeguard against such errors, thus ensuring that every step of the data science process is executed reliably and consistently [8], which enhances the precision of analyses and contributes to the creation of more reproducible and trustworthy results. The acceleration of analysis is a key benefit brought about by automating data science workflows. Complex tasks like data processing, which traditionally require a significant investment of effort and time, can be streamlined and executed swiftly through automated processes. Model training can also be optimized for efficiency since it is a resource-intensive phase that enables data scientists to experiment with various models in a more iterative and agile.

3.4. Continuous integration and continuous deployment

CI/CD is a set of best practices to ensure software's rapid and dependable release. When this principle gets applied to data science, it brings about a shift wherein the iterative nature of modeling and analysis is seamlessly merged into a cohesive and automated workflow. Adopting CI/CD practices in data science pushes teams to engage in swift iterations, thus allowing for dynamic adjustments in modeling and analysis in response to evolving data or emerging insights. This approach, akin to the frequent software releases in traditional DevOps settings, thus facilitates the seamless evolution of data science projects. As data evolves or models are refined, the CI/CD pipeline ensures that these changes are rigorously tested through automated testing and validation pipelines before deployment.

Automated testing in this process systematically validates each modification to the models or data to ensure that the integrity of the analysis is maintained. This guarantees the reliability of the data-driven decision-making process and instills confidence in the outcomes, knowing adjustments have been thoroughly scrutinized before integration into the production environment. In addition, the CI/CD pipeline in data science ensures consistency and repeatability, which is key in promoting trust in decision-making [9]. Every workflow step, from data ingestion to model deployment, is su, object to automated validation, thus minimizing the risk of introducing discrepancies and errors.

3.5. Infrastructure as Code (IaC)

IaC involves codifying infrastructure provisioning and treating infrastructure configurations as code artifacts. Extending these principles to data sciences is a strategic step that deals with the inherent challenges of creating and maintaining consistent environments for analysis across different stages of production and development. Adopting containerization and orchestration tools like Kubernetes and Docker further improves the application of IaC principles in data science. Containers incorporate the entire environment's required data analysis into one package, like dependencies, libraries, and runtime.

This compartmentalization ensures that the analysis environment remains consistent and can be effortlessly replicated across different computing environments [10]. In addition, orchestration tools like Kubernetes provide an efficient and scalable means of managing and deploying these containerized environments. This ensures that the uniformity achieved during development is seamlessly extended into the production stage. The ability to deploy, scale, and consistently manage data science workloads becomes a key advantage, promoting reliability and predictability in the deployment process.

3.6. Monitoring and feedback loops

This principle reflects a strategic alignment highlighting the importance of continuous adaptability and oversight in analytical model development. Drawing parallels with the vigilance used to applications in production within the DevOps space, the application of monitoring and feedback loops in data science becomes a key practice for ensuring the ongoing relevance, accuracy, and efficacy of analytical models. Models are instrumental in data science in extracting insights from data; the concept of continuous monitoring shows the key approach adopted in DevOps for applications in production.

This incorporates the systematic observation and measurement of key metrics associated with model performance, data distributions, and other relevant indicators [11]. Continuous monitoring assists in detecting any deviations or shifts in these metrics, which could affect the relevance and accuracy of the models. The significance of this principle is that the mechanisms enable the timely assimilation of insights gained from monitoring into the repetitive development process of data science models.

When data distribution shifts or model performance changes are detected, feedback loops trigger a response often involving updates, recalibration, or even model retraining. The repetitive feedback mechanism ensures that models remain adaptive to evolving conditions and continue to deliver accurate and relevant results over time. One of the main advantages of implementing monitoring and feedback loops in data science is the ability to address issues promptly before they escalate.

By continuously assessing the performance and relevance of models, organizations can preemptively identify and rectify any anomalies or deviations, thus ensuring data-driven decisions are based on the most accurate and up-to-date insights [12]. In addition, this repetitive nature seamlessly aligns with the development practices often employed in data science. It creates a cycle that is not static but evolves in response to changing data patterns and emerging trends.

4. Benefits of DevOps in Data Science

4.1. Faster time-to-insights

DevOps practices can expedite data science projects by dismantling barriers, optimizing workflows, and promoting a culture of continuous improvement. This streamlined technique reduces bottlenecks, allowing quicker model development and deployment iteration. This, in return, leads to reduced time-to-insights, thus allowing organizations to gather valuable information from their data more timely.

4.2. Improved collaboration

As mentioned, DevOps principles provide easy collaboration, and this ethos is seamlessly translated into the data science space. A shared understanding of project goals is nurtured by encouraging communication between development and data science teams. This collaboration breaks down silos and also facilitates the creation of automated and reproducible data processing, modeling, and analysis pipelines, thus leading to more efficient and faster insights.

4.3. Reduced risks

One of the key benefits DevOps brings to data science is risk reduction. CI/CD practices minimize the risks of deploying faulty models or introducing errors into production systems. Stringent automated testing ensures that insights derived from data are accurate and reliable. Risk mitigation is key to maintaining the integrity of the decision-making process.

4.4. Scalability

Using DevOps practices in data science ensures scalability, which is crucial, especially with the ever-growing data volumes. Data pipelines and analysis workflows become capable of handling larger and more complex datasets. This scalability future-proofs data science projects and initiatives and positions organizations to leverage the full potential of the data assets.

4.5. Improved integration for maximum value extraction

The merge between Data science and DevOps allows organizations to discover a potent formula for maximizing the value derived from the data assets. While challenges like navigating the difficulties of model deployment and cultural shifts may exist, aligning these two disciplines empowers organizations to stay at the front of innovation in the data-driven eta. This holistic integration of DevOps principles ensures efficiency and reliability and positions organizations to extract the maximum value from their data, thus driving success and innovation in this dynamic landscape.

5. Challenges and considerations

Navigating the integration of DevOps with data science brings about different benefits and equally a lot of challenges and considerations. The difficulties of dealing with different and complex data and the need for specialized tools require proper planning [13]. This integration often requires adjustments to established workflows and innovative technologies that cater to the distinctive requirements of both development and data domains. In addition, addressing cultural shifts within teams is a vital step for successfully implementing DevOps for data science.

Achieving collaboration needs promoting open communication, thus nurturing mutual understanding and cultivating the willingness to bridge the gaps between traditionally distinct roles. Getting to this collaboration depicts the importance of acknowledging and addressing these challenges; one huge problem is overcoming the cultural shift relevant to dismantling the traditional separation between development and data science, requiring a concerted effort to align goals and promote a collaboration culture.

In addition, the need for data governance is huge, particularly concerning data privacy and compliance when shared across teams. Implementing stringent data governance practices is important to mitigate risks [14]. Consequently, adopting DevOps in data science brings about considerations related to tooling and skills. Data scientists may find themselves adapting to new tools and practices while developers may be required to acquaint themselves with fundamental data science concepts; thus, bridging the knowledge gap is important to create a harmonious collaboration between these domains.

6. Enhancing collaboration with DevOps

DevOps incorporates different tasks like configuration management, continuous integration, deployment, testing, monitoring, and infrastructure provisioning. Traditionally, DevOps teams closely worked with development teams to manage the lifecycle of applications effectively. However, the integration of data science in DevOps brings forth new responsibilities. Data engineering, for instance, is a specialized category that deals with data pipelines, which needs close cooperation between DevOps and data science teams.

DevOps operators are tasked with provisioning strong clusters for technologies such as Apache Airflow or Apache Hadoop, which are key for data transformation and extraction [15]. Data engineers navigate different data sources, leveraging Big Data clusters and pipelines to transform raw data. Subsequently, data scientists use tools such as Power BI and Notebooks to visualize and explore transformed data. DevOps teams are essential in providing environments tailored for data visualization and exploration.

Building machine learning models brings about a huge shift from traditional application development. It is a heterogeneous process that involves different languages, toolkits, libraries, and development environments like Python or Visual Studio Code [16]. DevOps teams must ensure these environments are readily available for data scientists and developers working on ML issues. The resource-intensive nature of machine learning and deep learning needs substantial computing infrastructure with powerful CPUs and GPUs. DevOps handles tasks like provisioning, configuring, scaling, and managing clusters for frameworks like Microsoft CNTK and TensorFlow.

Automation is key in the DevOps toolkit as it is used to streamline these processes, including creating and terminating instances. ML development is key to employing CI/CD best practices like modern application development. Every version of an ML model is packaged as a container image, tagged differently, and managed through sophisticated CI/CD pipelines [17]. When a fully-trained ML model is ready, DevOps teams are responsible for hosting it in a scalable environment, often leveraging orchestration engines like Kubernetes.

Containers and container management tools have significantly eased ML development, providing manageability and efficiency. DevOps teams embrace containers for different reasons, such as provisioning development environments, data processing pipelines, training infrastructure, and model deployment environments. Emerging technologies such as Kube Flow are specifically designed to empower DevOps teams in navigating the unique challenges posed by ML infrastructure [18]. ML brings a new path to DevOps, requiring collaborative efforts between operators, data scientists, developers, and data engineers to support businesses embracing the ML pattern effectively.

7. Versioning and rollback

The significance of versioning and rollback in the lifecycle management of deployed ML models is key in tracking the evolution of the models over time, capturing changes and improvements, and allowing for comprehensive comparisons between various versions. This includes assessing outputs, inputs, parameters, metrics, and code. On the other hand, rollback functionality is important in ensuring swift restoration to a prior model version in case of errors, bugs, or performance deterioration. This combination ensures the ML model's maintenance of quality, consistency, and reproducibility [19]. In addition, they empower organizations to adapt to changing data, evolving requirements, and valuable feedback, thus promoting agility in ML model management.

Organizations can systematically captures and stores all dependencies and components constituting the model, data, code, configuration, environment, artifacts, and metadata to achieve this. Every model version needs a distinct identifier, timestamp, and comprehensive documentation outlining the changes and reasoning behind them. Tools such as MLflow can significantly help in the versioning process. These tools integrate with existing data sources, code repositories, and deployment platforms, thus presenting one interaction that efficiently manages model versions.

This contributes to the efficient development of model versions, thus emphasizing the merger between development and data science teams [20]. Implementing a rollback strategy for your ML model should entail establishing a mechanism that allows seamless switching between different model versions within the organization's production environment without compromising service performance and availability. Employing a deployment strategy that fully supports rollback is essential to achieve this.

An example is the rolling deployment strategy, where updates or new versions of the ML model are introduced across nodes or instances in the production environment. This incremental update allows for a gradual traffic transition from instances running the old version to those with the new version, thus ensuring a smooth deployment process [21]. Throughout this transition, key performance indicators are monitored carefully to verify that the new version operates as expected.

Most importantly, the rolling deployment strategy provides the flexibility to roll back or halt the deployment if any issues are detected, or the new version fails to meet performance criteria. After all is done, it is important to validate and enhance the effectiveness of your versioning and rollback processes. Rigorous testing of your ML model is important, particularly from pre-deployment, deployment, and post-deployment. The testing protocol comprehensively examines the ML model's functionality, quality, and compatibility with key components such as environment, service, code, and data.

Most importantly, testing evaluates the impact and outcomes of versioning and rollback actions, such as performance metrics, stability, and user satisfaction within the model service [22]. This testing is a proactive measure that helps prevent and detect errors, bugs, and potential failures. It ensures the ML model aligns with the predefined requirements and meets the stakeholders' expectations. The framework is strengthened by integrating strong testing practices [23], thus fostering a collaborative environment that bridges the gap between development and data pipelines, highlighting reliability and quality assurance.

8. The future

The merge of DevOps and data science, is poised to experience evolution with different key trends. Different tools designed for the fusion of these two are expected to streamline processes and promote enhanced collaboration. As the collaboration advances, automation will expand its abilities across the whole data science workflow. This will include the model deployment and development automation and the orchestration of end-to-end data science processes. Advanced automation tools will streamline repetitive tasks, thus allowing teams to focus on higher-value activities [24], thus enhancing efficiency and reducing the need for manual intervention.

In addition, the adoption of containerization technologies and microservices architecture will be more widespread. This will allow organizations to encapsulate data science applications and their dependencies within portable containers, thus facilitating scalable and consistent deployment across different environments. Subsequently, embracing these architectures will enable a modular and flexible approach to building and maintaining data science solutions.

Consequently, MLOps, an extension of DevOps curated for ML, will grow to address the unique challenges of managing the entire ML model lifecycle. Organizations will increasingly focus on establishing strong practices for versioning, testing, deploying, monitoring, and maintaining ML models [26]. This will ensure that machine learning seamlessly integrates into the broader DevOps, thus promoting a more efficient and holistic data-driven decision-making process.

The future will have specialized tools for DevOps in data science, catering to the unique needs of integrating data science workflows into DevOps pipelines, thus providing organizations with dedicated resources to navigate the difficulties of collaborative model deployments and development [27]. Artificial intelligence for operations (AIOps) will revolutionize the management of DevOps pipelines and processes. AI and ML algorithms will optimize decision-making processes, automate routine tasks, and provide predictive insights. This merge will enhance the efficiency and adaptability of DevOps practices, thus making them more responsive to dynamic operational needs.

9. Conclusion

Organizations adopting the intersection between DevOps are set to be propelled to the front in this data-driven era. Bridging the traditional gap between development and data analytics allows teams to collaborate seamlessly [28], thus getting the true potential of data-driven insights. This merge accelerates time-to-insights and ensures their reliability, precision, and a huge impact on decision-making. DevOps for data science is a collaboration and a transformative move for revolutionizing how organizations extract business value from their data [29]. The approach to building, deploying, and maintaining data-driven applications brings a new era of efficiency and innovation.

As organizations increasingly depend on data-driven decision-making, embracing DevOps is key to success in the competitive data-driven space. DevOps is a methodology that will continue to reshape the software development and deployment space. This shift emphasizing collaboration, automation, and continuous improvement allows organizations to deliver high-quality applications with reliability and speed. By bridging the gap between these two, DevOps teams will lean more towards shared goals [30], thus ensuring exceptional experiences and adaptability in case of any evolving demands of the digital world. Organizations embracing this collaboration will be equipped to navigate the challenges and opportunities of the data-driven future with the right innovative strategies.

References

[1] Ereth, J. (2018). DataOps-Towards a Definition. LWDA, 2191, 104-112.

[2] Lwakatare, L. E., Kuvaja, P., & Oivo, M. (2016). An exploratory study of devops extending the dimensions of devops with practices. Icsea, 104, 2016.

[3] Grady, N. W., Payne, J. A., & Parker, H. (2017, December). Agile big data analytics: AnalyticsOps for data science. In 2017 IEEE international conference on big data (big data) (pp. 2331-2339). IEEE.

[4] Bou Ghantous, G., & Gill, A. (2017). DevOps: Concepts, practices, tools, benefits and challenges. PACIS2017.

[5] Ivanova, A., & Ivanova, P. (2018). DATA ANALYTICS FOR DEVOPS EFFECTIVЕNESS.

[6] Fokaefs, M., Barna, C., Veleda, R., Litoiu, M., Wigglesworth, J., & Mateescu, R. (2016, October). Enabling devops for containerized data-intensive applications: an exploratory study. In Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering (pp. 138-148).

[7] Muñoz, M., & Díaz, O. (2017). DevOps: Foundations and its utilization in data center. Engineering and Management of Data Centers: An IT Service Management Approach, 205-225.

[8] Lwakatare, L. E., Kuvaja, P., & Oivo, M. (2016). Relationship of devops to agile, lean and continuous deployment: A multivocal literature review study. In Product-Focused Software Process Improvement: 17th International Conference, PROFES 2016, Trondheim, Norway, November 22-24, 2016, Proceedings 17 (pp. 399-415). Springer International Publishing.

[9] Angara, J., Prasad, S., & Sridevi, G. (2017). The factors driving testing in devops setting-a systematic literature survey. Indian Journal of Science and Technology, 9(48), 1-8.

[10] George, G., Osinga, E. C., Lavie, D., & Scott, B. A. (2016). Big data and data science methods for management research. Academy of Management Journal, 59(5), 1493-1507.

[11] Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 1-42.

[12] Vartak, M., & Madden, S. (2018). Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Eng. Bull., 41(4), 16-25.

[13] Baylor, D., Breck, E., Cheng, H. T., Fiedel, N., Foo, C. Y., Haque, Z., ... & Zinkevich, M. (2017, August). Tfx: A tensorflow-based production-scale machine learning platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1387-1395).

[14] Zaharia, M., Chen, A., Davidson, A., Ghodsi, A., Hong, S. A., Konwinski, A., ... & Zumar, C. (2018). Accelerating the machine learning lifecycle with MLflow. IEEE Data Eng. Bull., 41(4), 39-45.

[15] Wu, D., Zhu, L., Xu, X., Sakr, S., Sun, D., & Lu, Q. (2016). Building pipelines for heterogeneous execution environments for big data processing. IEEE Software, 33(2), 60-67.

[16] Wang, R., Sun, D., Li, G., Atif, M., & Nepal, S. (2016, December). Logprov: Logging events as provenance of big data analytics pipelines with trustworthiness. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 1402-1411). IEEE.

[17] Steele, B., Chandler, J., & Reddy, S. (2016). Algorithms for data science (pp. 1-430). New York: Springer.

[18] Kotu, V., & Deshpande, B. (2018). Data science: concepts and practice. Morgan Kaufmann.

[19] McMaster, K., Wolthuis, S. L., Rague, B., & Sambasivam, S. (2018). A comparison of key concepts in data analytics and data science. Information Systems Education Journal, 16(1), 33.

[20] Draxl, C., & Scheffler, M. (2018). NOMAD: The FAIR concept for big data-driven materials science. Mrs Bulletin, 43(9), 676-682.

[21] Liu, S., Wang, X., Liu, M., & Zhu, J. (2017). Towards better analysis of machine learning models: A visual analytics perspective. Visual Informatics, 1(1), 48-56.

[22] Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733.

[23] Evtimov, I., Eykholt, K., Fernandes, E., Kohno, T., Li, B., Prakash, A., ... & Song, D. (2017). Robust physical-world attacks on machine learning models. arXiv preprint arXiv:1707.08945, 2(3), 4.

[24] Suthaharan, S. (2016). Machine learning models and algorithms for big data classification. Integr. Ser. Inf. Syst, 36, 1-12.

[25] Kim, B. (2015). Interactive and interpretable machine learning models for human machine collaboration (Doctoral dissertation, Massachusetts Institute of Technology).

[26] Kamuto, M. B., & Langerman, J. J. (2017, May). Factors inhibiting the adoption of DevOps in large organisations: South African context. In 2017 2nd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT) (pp. 48-51). IEEE.

[27] Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745-766.

[28] Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I. A. T., Siddiqa, A., & Yaqoob, I. (2017). Big IoT data analytics: architecture, opportunities, and open research challenges. ieee access, 5, 5247-5261.

[29] Mottin, D., Lissandrini, M., Velegrakis, Y., & Palpanas, T. (2017). New trends on exploratory methods for data analytics. Proceedings of the VLDB Endowment, 10(12), 1977-1980.

[30] Karpatne, A., Atluri, G., Faghmous, J. H., Steinbach, M., Banerjee, A., Ganguly, A., ... & Kumar, V. (2017). Theory-guided data science: A new paradigm for scientific discovery from data. IEEE Transactions on knowledge and data engineering, 29(10), 2318-2331.

Search This Blog

Little Documentary

DevOps for Data Science by bridging the gap between development and data pipelines

Comments

Post a Comment

Popular posts from this blog

Enforcing Blockchains’ Unique Security Controls in Copyright and Royalty Protection