Data science combines many different competencies, from statistics to industry expertise and computer science. In the computer science field, scientific procedures and methods are used to analyse data and extract added value from it. Likewise, programming in the data science context is designed to make it easy to process and analyse the data and use it to generate forecasts.
Our experts therefore focus on data-centric programming languages, such as R and Python combined with Jupyter Notebooks and the respective specialist libraries like numpy, pandas, scikit-learn, keras, tensorflow, prophet, OpenCV or Shiny.
Before we implement use cases together with you in large-scale projects, we offer you the opportunity to quickly test the first use cases in a proof of concept (PoC). We recommend the open source programming languages Python and R for this purpose.
Open source provides two key advantages:
Another advantage of Python and R is that almost all major manufacturers, such as Microsoft, SAS, SAP or cloudera, provide interfaces for the programming languages. This means that data science models developed in PoC can be easily transferred to the production system. For this reason, we continually train our data scientists in programming, data analysis and visualisation and have already been able to apply our expertise in many projects.
Python is currently the most widely used programming language for machine learning and is particularly suitable for the development of sophisticated models and prediction modules that can be directly integrated into production systems. This is why our employees are proficient in all important Python libraries in the data science field.
The scikit-learn library offers all necessary algorithms and methods of machine learning. These range from common regressions and clustering methods to classification and prediction modules.
Depending on the complexity of the problem, methods from deep learning can also be used. Our data scientists are certified in this field and have extensive expertise in the keras, tensorflow and pytorch libraries. Given their flexibility, they are especially suitable for implementing your own deep learning solutions.
R is a powerful and flexible scripting language used especially for the analysis and visualisation of data. A large part of all new developments in the fields of statistics and machine learning takes place in R, which enables our experts to constantly test new statistical methods.
In addition, R is very easy to automate and integrate – for example in Git, ODBC, Oracle R Enterprise, Spark or Hadoop. Like Python, R has a large number of libraries. data.table lends itself particularly well to the preparation of large amounts of data. All processes from the fields of machine and deep learning can be applied with the help of the caret library. We can create very impressive visualisations with libraries like ggplot or plotly. If it is subsequently planned for a use case to go into production by means of an app, our specialists will access the Shiny library.