R , Julia, Python – todays data scientists have the choice between numerous different programming languages, each with their own strengths and weaknesses. Would it not be convenient to bring those languages together and use the individual strength of each language? The package “reticulate” for R takes a step in this direction.
The reticulate package creates a Python connection for R which enables the user to combine both languages for an optimal result. This tutorial referring to the Python engine for R Markdown implemented by reticulate, we will illustrate the basic idea, functionality and usefulness of the engine.
As an in-depth tutorial with examples and a fictional use case, we include an HTML notebook which is created in R Markdown. You can find it here.
The weapon of choice – data analysis in your day-to-day work
It could all be so simple: Everyone speaks the same language, borders blur and collaborations become easier than ever. What seems to sound a bit dramatic, happens more often in the everyday life of a data scientist. Even though R can show its strengths from a statistical point of view, Python, as a multi-purpose language, is becoming increasingly popular in the data science scene. Because of its multifunctionality, it is used more and more frequently the programming language of choice. Therefore, it seems necessary to build a bridge between both languages and create a common space within a project.
The virtual bridge – the reticulate package
The reticulate package provides a Python connection for R. Users can execute Python code directly from the R console, load Python scripts into the R environment and make them usable for R. In addition, reticulate provides a Python engine for R Markdown, which is described in the following example in more detail.
Introduction – setup & Python chunks in R Markdown
The setup of reticulate is very simple. After installing the package as well as the preferred distribution (2.7x or 3.x) you can
1
2
|
library(reticulate)
knitr::knit_engines$set(python = reticulate::eng_python)
|
install the Python engine. The chunks of code work as usual and every R chunk can be used for a Python chunk.
For example:
1
2
3
|
```{r testChunk_R, warning = FALSE, message = FALSE}
Code
```
|
R chunk and
1
2
3
|
```{python TestChunk_Py, warning = FALSE, message = FALSE}
Code
```
|
a Python chunk. But attention: The code executed as a Python chunk will not be loaded into the global environment! Access to functions and/ or variables from outside is only possible when knit is executed which means that the execution of chunk alone is not enough. In our corresponding HTML notebook you will find further examples for the implementation of functions in different languages.
Continuation – automatic type conversion and interconnectivity
Taken together, reticulate creates an appropriate bridge between two languages. This allows developers with different programming skills to work in the same document of a joint project, all that without having to translate code. Furthermore, the strengths of both languages can be used. To loosen up boundaries, reticulate also provides the possibility to access variables of other languages within the chunk of one language given.
The following code extract illustrates the function:
1
2
3
4
5
6
7
8
9
10
11
|
```{r chunk1_R}
test_vector <– c(1, 2, 4, 4, 7, 3, 5)
```
```{python chunk1_Py}
num_four = r.test_vector.count(4)
print „Der Vektor enthält “, num_four, „ mal die Zahl 4“
```
```{r chunk2_R}
out_string = paste(„Richtig, der Vektor enthält“, py$num_four, „mal die Zahl 4“)
print(out_string)
```
|
Reticulate automatically adapts types of respective language. For example, if chunk1_Py reads test_vector, the R type vector is converted into the equivalent Python type list.
Further type assignments can be found here.
Source_Python – the “hidden treasure” among functions
The ability to implement Python code directly into a document and use it across languages already provides us with a powerful tool for seamless collaboration. The hidden treasure among functions in the reticulate package can be found in the function source_python. This function reads a Python script and enables functions and variables for both languages within the markdown document. This operating mode is illustrated in the following example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
// Datei bubblesort.py
def bsort(numbers):
if(isinstance(numbers, list) == False or
all(isinstance(p, (int, float)) for p in numbers) == False or
len(numbers) == 0):
print “Warning: Input must be of Type List, only numeric and have length >= 1”
return
for i in reversed(range(1, len(numbers))):
for j in range(0, i):
if(numbers[j] > numbers[j+1]):
numbers = swap(numbers, j, j+1)
return numbers
def swap(numbers, id1, id2):
temp = numbers[id1]
numbers[id1] = numbers[id2]
numbers[id2] = temp
return numbers
// Markdown Dokument
```{r setup_script}
source_python(„bubblesort.py“)
```
```{r sort_R}
bsort(c(4, 3, 2, 1))
```
Output: 1, 2, 3, 4
```{python sort_Py}
print bsort([4,3,2,1])
```
|
Output: 1,2,3,4
The Python script bubblesort.py implements the bubblesort sorting algorithm under bsort(numbers) function. This implementation is then provided in the markdown document with source_python(„bubblesort.py“) and can be used by both languages, as shown above.
With this feature reticulate removes the last existing limitation. This allows developers to completely work in their preferred development environment and finally merge the results into a final document. Already existing Python scripts (e.g. scripts for database connection, deep learning algorithms, etc.) can be transferred seamlessly.
Use case – text mining with preparatory work
Considering the following fictional use case: an electronics retailer Elektro-X wants to sign an advertising contract with one of the three retailers Lenavu, Ace-Ar and AS-OS. In order to get a more detailed picture of the retailer’s satisfaction, Elektro-X starts a text mining project implemented in Python: shop reviews will be read and reviews for the retailer’s products will be searched. In addition, a function is provided for calculating an average rating of several reviews. Due to a lack of time, Elektro-X is unable to follow through with the project and hired eoda to finish this project. eoda has completed the analysis in R to gain access to ggplot for visualization while using the pre-work done in Python. eoda has transferred the project to R using the reticulate package. Furthermore, R has been used to visualize the evaluations and sentiment analysis.
Conclusion – the future of big data
Concluding, it can be said that packages like reticulate have the potential to significantly influence the big data industry and thus change its future. Especially service providers like eoda can benefit from barrier-free collaboration with their customers, as they can work on core problems more quickly without having to deal with possible problems beforehand. If you would like to work more efficiently on customer projects in the future, start considering language barriers solutions such as reticulate.
This way.