3.4. Moving Data between Java and R Code

If you read the Evaluating R Language Code to this guide you already know how to execute R code from a Java application. In this chapter we will take things a little further and explain how you can move data between Java and R code.

Renjin provides a mapping from R language types to Java objects. To use this mapping effectively you should have at least a basic understanding of R’s object types. The next section provides a short introduction which is essentially a condensed version of the relevant material in the R Language Definition manual. If you are already familiar with R’s object types you can skip this section and head straight to the section Pulling data from R into Java or Pushing data from Java to R.

3.4.1. A Java Developer’s Guide to R Objects

R has a number of objects types that are referred to as basic types. Of these, we only discuss those that are most frequently encountered by users of R: vectors, lists, functions, and the NULL object. We also discuss the two common compound objects in R, namely data frames and factors.

3.4.1.1. Attributes

Before we discuss these objects, it is important to know that all objects except the NULL object can have one or more attributes. Common attributes are the names attribute which contains the element names, the class attribute which stores the name of the class of the object, and the dim attribute and (optionally) its dimnames companion to store the size of each dimension (and the name of each dimension) of the object. For each object, the attributes() command will return a list with the attributes and their values. The value of a specific attribute can be obtained using the attr() function. For example, attr(x, "class") will return the name of the class of the object (or NULL if the attribute is not defined).

3.4.1.2. Vectors

There are six basic vector types which are referred to as the atomic vector types. These are:

logical:
a boolean value (for example: TRUE)
integer:
an integer value (for example: 1)
double:
a real number (for example: 1.5)
character:
a character string (for example: "foobar")
complex:
a complex number (for example: 1+2i)
raw:
uninterpreted bytes (forget about this one)

These vectors have a length and can be indexed using [ as the following sample R session demonstrates:

> x <- 2
> length(x)
[1] 1
> y <- c(2, 3)
> y[2]
[1] 3

As you can see, even single numbers are vectors with length equal to one. Vectors in R can have missing values that are represented as NA. Because all elements in a vector must be of the same type (i.e. logical, double, int, etc.) there are multiple types of NA. However, the casual R user will generally not be concerned with the different types for NA.

> x <- c(1, NA, 3)
> x
[1]  1 NA  3
> y <- as.character(NA)
> y
[1] NA
> typeof(NA) # default type of NA is logical
[1] "logical"
> typeof(y) # but we have coerced 'y' to a character vector
[1] "character"

R’s typeof() function returns the internal type of each object. In the example above, y is a character vector.

3.4.1.3. Factors

Factors are one of R’s compound data types. Internally, they are represented by integer vectors with a levels attribute. The following sample R session creates such a factor from a character vector:

> x <- sample(c("A", "B", "C"), size = 10, replace = TRUE)
> x
 [1] "C" "B" "B" "C" "A" "A" "B" "B" "C" "B"
> as.factor(x)
 [1] C B B C A A B B C B
Levels: A B C

Internally, the factor in this example is stored as an integer vector c(3, 2, 2, 3, 1, 1, 2, 2, 3, 2) which are the indices of the letters in the character vector c(A, B, C) stored in the levels attribute.

3.4.1.4. Lists

Lists are R’s go-to structures for representing data structures. They can contain multiple elements, each of which can be of a different type. Record-like structures can be created by naming each element in the list. The lm() function, for example, returns a list that contains many details about the fitted linear model. The following R session shows the difference between a list and a list with named elements:

> l <- list("Jane", 23, c(6, 7, 9, 8))
> l
[[1]]
[1] "Jane"

[[2]]
[1] 23

[[3]]
[1] 6 7 9 8

> l <- list(name = "Jane", age = 23, scores = c(6, 7, 9, 8))
> l
$name
[1] "Jane"

$age
[1] 23

$scores
[1] 6 7 9 8

In R, lists are also known as generic vectors. They have a length that is equal to the number of elements in the list.

3.4.1.5. Data frames

Data frames are one of R’s compound data types. They are lists of vectors, factors and/or matrices, all having the same length. It is one of the most important concepts in statistics and has equivalent implementations in SAS and SPSS.

The following sample R session shows how a data frame is constructed, what its attributes are and that it is indeed a list:

> df <- data.frame(x = seq(5), y = runif(5))
> df
  x         y
1 1 0.8773874
2 2 0.4977048
3 3 0.6719721
4 4 0.2135386
5 5 0.3834681
> class(df)
[1] "data.frame"
> attributes(df)
$names
[1] "x" "y"

$row.names
[1] 1 2 3 4 5

$class
[1] "data.frame"

> is.list(df)
[1] TRUE

3.4.1.6. Matrices and arrays

Besides one-dimensional vectors, R also knows two other classes to represent array-like data types: matrix and array. A matrix is simply an atomic vector with a dim attribute that contains a numeric vector of length two:

> x <- seq(9)
> class(x)
[1] "integer"
> dim(x) <- c(3, 3)
> class(x)
[1] "matrix"
> x
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Likewise, an array is also a vector with a dim attribute that contains a numeric vector of length greater than two:

> y <- seq(8)
> dim(y) <- c(2,2,2)
> class(y)
[1] "array"

The example with the matrix shows that the elements in an array are stored in column-major order which is important to know when we want to access R arrays from a Java application.

Note

In both examples for the matrix and array objects, the class() function derives the class from the fact that the object is an atomic vector with the dim attribute set. Unlike data frames, these objects do not have a class attribute.

3.4.2. Overview of Renjin’s type system

Renjin has corresponding classes for all of the R object types discussed in the section A Java Developer’s Guide to R Objects. Table Renjin’s Java classes for common R object types summarizes these object types and their Java classes. In R, the object type is returned by the typeof() function.

Renjin’s Java classes for common R object types
R object type Renjin class
logical LogicalVector
integer IntVector
double DoubleVector
character StringVector
complex ComplexVector
raw RawVector
list ListVector
function Function
environment Environment
NULL Null

There is a certain hierarchy in Renjin’s Java classes for the different object types in R. Figure Hierarchy in Renjin’s type system gives a full picture of all classes that make up Renjin’s type system. These classes are contained in the org.renjin.sexp Java package. The vector classes listed in table Renjin’s Java classes for common R object types are in fact abstract classes that can have different implementations. For example, the DoubleArrayVector (not shown in the figure) is an implementation of the DoubleVector abstract class. The SEXP, Vector, and AtomicVector classes are all Java interfaces.

Note

Renjin does not have classes for all classes of objects that are know to (base) R. This includes objects of class matrix and array which are represented by one of the AtomicVector classes and R’s compound objects factor and data.frame which are represented by an IntVector and ListVector respectively.

../_images/renjin-class-hierarchy.png

Hierarchy in Renjin’s type system

3.4.3. Pulling data from R into Java

Now that you have a good understanding of both R’s object types and how these types are mapped to Renjin’s Java classes, we can start by pulling data from R code into our Java application. A typical scenario is one where an R script performs a calculation and the result is pulled into the Java application for further processing.

Using the Renjin Script Engine as introduced in the Evaluating R Language Code, we can store the result of a calculation from R into a Java object. By default, the eval() method of javax.script.ScriptEngine returns an Object, i.e. Java’s object superclass. We can always cast this result to a SEXP object. The following Java snippet shows how this is done and how the Object.getClass() and Class.getName() methods can be used to determine the actual class of the R result:

// evaluate Renjin code from String:
SEXP res = (SEXP)engine.eval("a <- 2; b <- 3; a*b");

// print the result to stdout:
System.out.println("The result of a*b is: " + res);
// determine the Java class of the result:
Class objectType = res.getClass();
System.out.println("Java class of 'res' is: " + objectType.getName());
// use the getTypeName() method of the SEXP object to get R's type name:
System.out.println("In R, typeof(res) would give '" + res.getTypeName() + "'");

This should write the following to the standard output:

The result of a*b is: 6.0
Java class of 'res' is: org.renjin.sexp.DoubleArrayVector
In R, typeof(res) would give 'double'

As you can see the getTypeName method of the SEXP class will return a String object with R’s name for the object type.

Note

Don’t forget to import org.renjin.sexp.* to make Renjin’s type classes available to your application.

In the example above we could have also cast R’s result to a DoubleVector object:

DoubleVector res = (DoubleVector)engine.eval("a <- 2; b <- 3; a*b");

or you could cast it to a Vector:

Vector res = (Vector)engine.eval("a <- 2; b <- 3; a*b");

You can’t cast R integer results to a DoubleVector: the following snippet will throw a ClassCastException:

// use R's 'L' suffix to define an integer:
DoubleVector res = (DoubleVector)engine.eval("1L");

3.4.3.1. Accessing individual elements of vectors

Now that we know how to pull R objects into our Java application we want to work with these data types in Java. In this section we show how individual elements of the Vector objects can be accessed in Java.

As you know, each vector type in R, and thus also in Renjin, has a length which can be obtained with the length() method. Individual elements of a vector can be obtained with the getElementAsXXX() methods where XXX is one of Double, Int, String, Logical, and Complex. The following snippet demonstrates this:

Vector x = (Vector)engine.eval("x <- c(6, 7, 8, 9)");
System.out.println("The vector 'x' has length " + x.length());
for (int i = 0; i < x.length(); i++) {
    System.out.println("Element x[" + (i + 1) + "] is " + x.getElementAsDouble(i));
}

This will write the following to the standard output:

The vector 'x' has length 4
Element x[1] is 6.0
Element x[2] is 7.0
Element x[3] is 8.0
Element x[4] is 9.0

As we have seen in the Lists section above, lists in R are also known as generic vectors, but accessing the individual elements and their elements requires a bit more care. If an element (i.e. a vector) of a list has length equal to one, we can access this element directly using one of the getElementAsXXX() methods. For example:

ListVector x =
    (ListVector)engine.eval("x <- list(name = \"Jane\", age = 23, scores = c(6, 7, 8, 9))");
System.out.println("List 'x' has length " + x.length());
// directly access the first (and only) element of the vector 'x$name':
System.out.println("x$name is '" + x.getElementAsString(0) + "'");

which will result in:

List 'x' has length 3
x$name is 'Jane'

being printed to standard output. However, this approach will not work for the third element of the list as this is a vector with length greater than one. The preferred approach for lists is to get each element as a SEXP object first and then to handle each of these accordingly. For example:

DoubleVector scores = (DoubleVector)x.getElementAsSEXP(2);

3.4.3.2. Dealing with matrices

As described in the section Matrices and arrays above, matrices are simply vectors with the dim attribute set to an integer vector of length two. In order to identify a matrix in Renjin, we need to therefore check for the presence of this attribute and its value. Since any object in R can have one or more attributes, the SEXP interface defines a number of methods for dealing with attributes. In particular, hasAttributes will return true if there are any attributes defined in an object and getAttributes will return these attributes as a AttributeMap.

Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
if (res.hasAttributes()) {
    AttributeMap attributes = res.getAttributes();
    Vector dim = attributes.getDim();
    if (dim == null) {
        System.out.println("Result is a vector of length " +
            res.length());

    } else {
        if (dim.length() == 2) {
            System.out.println("Result is a " +
                dim.getElementAsInt(0) + "x" +
                dim.getElementAsInt(1) + " matrix.");
        } else {
            System.out.println("Result is an array with " +
                dim.length() + " dimensions.");
        }
    }
}

Output:

Result is a 3x3 matrix.

For convenience, Renjin includes a wrapper class Matrix that provides easier access to the number of rows and columns.

Example:

// required import(s):
import org.renjin.primitives.matrix.*;

Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
try {
    Matrix m = new Matrix(res);
    System.out.println("Result is a " + m.getNumRows() + "x"
        + m.getNumCols() + " matrix.");
} catch(IllegalArgumentException e) {
    System.out.println("Result is not a matrix: " + e);
}

Output:

Result is a 3x3 matrix.

3.4.3.3. Dealing with lists and data frames

The ListVector class contains several convenience methods to access a list’s components from Java. For example, we can the extract the components from a fitted linear model using the name of the element that contains those components. For example:

ListVector model = (ListVector)engine.eval("x <- 1:10; y <- x*3; lm(y ~ x)");
Vector coefficients = model.getElementAsVector("coefficients");
// same result, but less convenient:
// int i = model.indexOfName("coefficients");
// Vector coefficients = (Vector)model.getElementAsSEXP(i);

System.out.println("intercept = " + coefficients.getElementAsDouble(0));
System.out.println("slope = " + coefficients.getElementAsDouble(1));

Output:

intercept = -4.4938668397781774E-15
slope = 3.0

3.4.4. Handling errors generated by the R code

Up to now we have been able to execute R code without any concern for possible errors that may occur when the R code is evaluated. There are two common exceptions that may be thrown by the R code:

  1. ParseException: an exception thrown by Renjin’s R parser due to a syntax error and
  2. EvalException: an exception thrown by Renjin when the R code generates an error condition, for example by the stop() function.

Here is an example which catches an exception from Renjin’s parser:

// required import(s):
import org.renjin.parser.ParseException;

try {
    engine.eval("x <- 1 +/ 1");
} catch (ParseException e) {
    System.out.println("R script parse error: " + e.getMessage());
}

Output:

R script parse error: Syntax error at line 1 char 0: syntax error, unexpected '/'

And here’s an example which catches an error condition thrown by the R interpreter:

// required import(s):
import org.renjin.eval.EvalException;

try {
    engine.eval("stop(\"Hello world!\")");
} catch (EvalException e) {
    // getCondition() returns the condition as an R list:
    Vector condition = (Vector)e.getCondition();
    // the first element of the string contains the actual error message:
    String msg = condition.getElementAsString(0);
    System.out.println("The R script threw an error: " + msg);
}

Output:

The R script threw an error: Hello world!

EvalException.getCondition() is required to pull the condition message from the R interpreter into Java.

3.4.5. Pushing data from Java to R

Like many dynamic languages, R scripts are evaluated in the context of an environment that looks a lot like a dictionary. You can define new variables in this environment using the javax.script API. This is achieved using the ScriptEngine.put() method.

Example:

engine.put("x", 4);
engine.put("y", new double[] { 1d, 2d, 3d, 4d });
engine.put("z", new DoubleArrayVector(1,2,3,4,5));
engine.put("hashMap", new java.util.HashMap());
// some R magic to print all objects and their class with a for-loop:
engine.eval("for (obj in ls()) { " +
    "cmd <- parse(text = paste('typeof(', obj, ')', sep = ''));" +
    "cat('type of ', obj, ' is ', eval(cmd), '\\n', sep = '') }");

Output:

type of hashMap is externalptr
type of x is integer
type of y is double
type of z is double

Renjin will implicitly convert primitives, arrays of primitives and String instances to R objects. Java objects will be wrapped as R externalptr objects. The example also shows the use of the DoubleArrayVector constructor to create a double vector in R. You see that we managed to put a Java java.util.HashMap object into the global environment of the R session: this is the topic of the chapter Importing Java classes into R code.