3.5. Moving Data between Java and R Code¶
If you read the Evaluating R Language Code to this guide you already know how to execute R code from a Java application. In this chapter we will take things a little further and explain how you can move data between Java and R code.
Renjin provides a mapping from R language types to Java objects. To use this mapping effectively you should have at least a basic understanding of R’s object types. The next section provides a short introduction which is essentially a condensed version of the relevant material in the R Language Definition manual. If you are already familiar with R’s object types you can skip this section and head straight to the section Pulling data from R into Java or Pushing data from Java to R.
3.5.1. A Java Developer’s Guide to R Objects¶
R has a number of objects types that are referred to as basic types. Of these,
we only discuss those that are most frequently encountered by users of R:
vectors, lists, functions, and the NULL
object. We also discuss the two common
compound objects in R, namely data frames and factors.
3.5.1.1. Attributes¶
Before we discuss these objects, it is important to know that all objects
except the NULL
object can have one or more attributes. Common attributes
are the names
attribute which contains the element names, the class
attribute which stores the name of the class of the object, and the dim
attribute and (optionally) its dimnames
companion to store the size of each
dimension (and the name of each dimension) of the object. For each object, the
attributes()
command will return a list with the attributes and their
values. The value of a specific attribute can be obtained using the attr()
function. For example, attr(x, "class")
will return the name of the class
of the object (or NULL
if the attribute is not defined).
3.5.1.2. Vectors¶
There are six basic vector types which are referred to as the atomic vector types. These are:
- logical:
- a boolean value (for example:
TRUE
) - integer:
- an integer value (for example:
1
) - double:
- a real number (for example:
1.5
) - character:
- a character string (for example:
"foobar"
) - complex:
- a complex number (for example:
1+2i
) - raw:
- uninterpreted bytes (forget about this one)
These vectors have a length and can be indexed using [
as the following sample
R session demonstrates:
> x <- 2
> length(x)
[1] 1
> y <- c(2, 3)
> y[2]
[1] 3
As you can see, even single numbers are vectors with length equal to one.
Vectors in R can have missing values that are represented as NA
. Because all
elements in a vector must be of the same type (i.e. logical, double, int, etc.)
there are multiple types of NA
. However, the casual R user will generally
not be concerned with the different types for NA
.
> x <- c(1, NA, 3)
> x
[1] 1 NA 3
> y <- as.character(NA)
> y
[1] NA
> typeof(NA) # default type of NA is logical
[1] "logical"
> typeof(y) # but we have coerced 'y' to a character vector
[1] "character"
R’s typeof()
function returns the internal type of each object. In the
example above, y
is a character vector.
3.5.1.3. Factors¶
Factors are one of R’s compound data types. Internally, they are represented by
integer vectors with a levels
attribute. The following sample R session
creates such a factor from a character vector:
> x <- sample(c("A", "B", "C"), size = 10, replace = TRUE)
> x
[1] "C" "B" "B" "C" "A" "A" "B" "B" "C" "B"
> as.factor(x)
[1] C B B C A A B B C B
Levels: A B C
Internally, the factor in this example is stored as an integer vector c(3, 2,
2, 3, 1, 1, 2, 2, 3, 2)
which are the indices of the letters in the character
vector c(A, B, C)
stored in the levels
attribute.
3.5.1.4. Lists¶
Lists are R’s go-to structures for representing data structures. They can
contain multiple elements, each of which can be of a different type. Record-like
structures can be created by naming each element in the list. The lm()
function, for example, returns a list that contains many details about the
fitted linear model. The following R session shows the difference between a list
and a list with named elements:
> l <- list("Jane", 23, c(6, 7, 9, 8))
> l
[[1]]
[1] "Jane"
[[2]]
[1] 23
[[3]]
[1] 6 7 9 8
> l <- list(name = "Jane", age = 23, scores = c(6, 7, 9, 8))
> l
$name
[1] "Jane"
$age
[1] 23
$scores
[1] 6 7 9 8
In R, lists are also known as generic vectors. They have a length that is equal to the number of elements in the list.
3.5.1.5. Data frames¶
Data frames are one of R’s compound data types. They are lists of vectors, factors and/or matrices, all having the same length. It is one of the most important concepts in statistics and has equivalent implementations in SAS and SPSS.
The following sample R session shows how a data frame is constructed, what its attributes are and that it is indeed a list:
> df <- data.frame(x = seq(5), y = runif(5))
> df
x y
1 1 0.8773874
2 2 0.4977048
3 3 0.6719721
4 4 0.2135386
5 5 0.3834681
> class(df)
[1] "data.frame"
> attributes(df)
$names
[1] "x" "y"
$row.names
[1] 1 2 3 4 5
$class
[1] "data.frame"
> is.list(df)
[1] TRUE
3.5.1.6. Matrices and arrays¶
Besides one-dimensional vectors, R also knows two other classes to represent
array-like data types: matrix
and array
. A matrix is simply an atomic
vector with a dim
attribute that contains a numeric vector of length two:
> x <- seq(9)
> class(x)
[1] "integer"
> dim(x) <- c(3, 3)
> class(x)
[1] "matrix"
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Likewise, an array is also a vector with a dim
attribute that contains a
numeric vector of length greater than two:
> y <- seq(8)
> dim(y) <- c(2,2,2)
> class(y)
[1] "array"
The example with the matrix shows that the elements in an array are stored in column-major order which is important to know when we want to access R arrays from a Java application.
Note
In both examples for the matrix
and array
objects, the class()
function derives the class from the fact that the object is an atomic vector
with the dim
attribute set. Unlike data frames, these objects do not
have a class
attribute.
3.5.2. Overview of Renjin’s type system¶
Renjin has corresponding classes for all of the R object types discussed in the
section A Java Developer’s Guide to R Objects. Table
Renjin’s Java classes for common R object types summarizes these object types and their Java
classes. In R, the object type is returned by the typeof()
function.
R object type | Renjin class |
---|---|
logical | LogicalVector |
integer | IntVector |
double | DoubleVector |
character | StringVector |
complex | ComplexVector |
raw | RawVector |
list | ListVector |
function | Function |
environment | Environment |
NULL | Null |
There is a certain hierarchy in Renjin’s Java classes for the different object
types in R. Figure Hierarchy in Renjin’s type system gives a full picture of all
classes that make up Renjin’s type system. These classes are contained in the
org.renjin.sexp Java package. The vector classes listed in table
Renjin’s Java classes for common R object types are in fact abstract classes that can have
different implementations. For example, the DoubleArrayVector
(not shown in
the figure) is an implementation of the DoubleVector
abstract class. The
SEXP
, Vector
, and AtomicVector
classes are all Java
interfaces.
Note
Renjin does not have classes for all classes of objects that are know to
(base) R. This includes objects of class matrix
and array
which are
represented by one of the AtomicVector
classes and R’s compound objects
factor
and data.frame
which are represented by an IntVector
and
ListVector
respectively.
3.5.3. Pulling data from R into Java¶
Now that you have a good understanding of both R’s object types and how these types are mapped to Renjin’s Java classes, we can start by pulling data from R code into our Java application. A typical scenario is one where an R script performs a calculation and the result is pulled into the Java application for further processing.
Using the Renjin Script Engine as introduced in the Evaluating R Language Code, we can
store the result of a calculation from R into a Java object. By default, the
eval()
method of javax.script.ScriptEngine
returns an
Object
, i.e. Java’s object superclass. We can
always cast this result to a SEXP
object. The following Java
snippet shows how this is done and how the Object.getClass()
and Class.getName()
methods can be used to determine the actual class
of the R result:
// evaluate Renjin code from String:
SEXP res = (SEXP)engine.eval("a <- 2; b <- 3; a*b");
// print the result to stdout:
System.out.println("The result of a*b is: " + res);
// determine the Java class of the result:
Class objectType = res.getClass();
System.out.println("Java class of 'res' is: " + objectType.getName());
// use the getTypeName() method of the SEXP object to get R's type name:
System.out.println("In R, typeof(res) would give '" + res.getTypeName() + "'");
This should write the following to the standard output:
The result of a*b is: 6.0
Java class of 'res' is: org.renjin.sexp.DoubleArrayVector
In R, typeof(res) would give 'double'
As you can see the getTypeName
method of the SEXP
class
will return a String object with R’s name for the object type.
Note
Don’t forget to import org.renjin.sexp.*
to make Renjin’s type classes
available to your application.
In the example above we could have also cast R’s result to a DoubleVector object:
DoubleVector res = (DoubleVector)engine.eval("a <- 2; b <- 3; a*b");
or you could cast it to a Vector:
Vector res = (Vector)engine.eval("a <- 2; b <- 3; a*b");
You can’t cast R integer results to a DoubleVector
: the following snippet
will throw a ClassCastException
:
// use R's 'L' suffix to define an integer:
DoubleVector res = (DoubleVector)engine.eval("1L");
As mentioned in “Capturing results from Renjin” if you have more complex scripts, you can fetch individual values by their name. e.g.
engine.eval("someVar <- 123 \n otherVar <- 'hello'");
Environment global = engine.getSession().getGlobalEnvironment();
Context topContext = engine.getSession().getTopLevelContext();
DoubleArrayVector numVec = (DoubleArrayVector)global.getVariable(topContext, "someVar");
StringVector strVec = (StringVector)global.getVariable(topContext, "otherVar");
int someVar = numVec.getElementAsInt(0);
String otherVar = strVec.asString();
// do stuff with the variables created in your script
3.5.3.1. Accessing individual elements of vectors¶
Now that we know how to pull R objects into our Java application we want to work with these data types in Java. In this section we show how individual elements of the Vector objects can be accessed in Java.
As you know, each vector type in R, and thus also in Renjin, has a length which
can be obtained with the length()
method. Individual elements of a vector
can be obtained with the getElementAsXXX()
methods where XXX
is one of
Double
, Int
, String
, Logical
, and Complex
. The following
snippet demonstrates this:
Vector x = (Vector)engine.eval("x <- c(6, 7, 8, 9)");
System.out.println("The vector 'x' has length " + x.length());
for (int i = 0; i < x.length(); i++) {
System.out.println("Element x[" + (i + 1) + "] is " + x.getElementAsDouble(i));
}
This will write the following to the standard output:
The vector 'x' has length 4
Element x[1] is 6.0
Element x[2] is 7.0
Element x[3] is 8.0
Element x[4] is 9.0
As we have seen in the Lists section above, lists in R are also known
as generic vectors, but accessing the individual elements and their elements
requires a bit more care. If an element (i.e. a vector) of a list has length
equal to one, we can access this element directly using one of the
getElementAsXXX()
methods. For example:
ListVector x =
(ListVector)engine.eval("x <- list(name = \"Jane\", age = 23, scores = c(6, 7, 8, 9))");
System.out.println("List 'x' has length " + x.length());
// directly access the first (and only) element of the vector 'x$name':
System.out.println("x$name is '" + x.getElementAsString(0) + "'");
which will result in:
List 'x' has length 3
x$name is 'Jane'
being printed to standard output. However, this approach will not work for the
third element of the list as this is a vector with length greater than one.
The preferred approach for lists is to get each element as a SEXP
object first and then to handle each of these accordingly. For example:
DoubleVector scores = (DoubleVector)x.getElementAsSEXP(2);
3.5.3.2. Dealing with matrices¶
As described in the section Matrices and arrays above, matrices are
simply vectors with the dim
attribute set to an integer vector of length
two. In order to identify a matrix in Renjin, we need to therefore check for
the presence of this attribute and its value. Since any object in R can have
one or more attributes, the SEXP
interface defines a number of
methods for dealing with attributes. In particular, hasAttributes
will return true
if there are any attributes defined in an object and
getAttributes
will return these attributes as a
AttributeMap
.
Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
if (res.hasAttributes()) {
AttributeMap attributes = res.getAttributes();
Vector dim = attributes.getDim();
if (dim == null) {
System.out.println("Result is a vector of length " +
res.length());
} else {
if (dim.length() == 2) {
System.out.println("Result is a " +
dim.getElementAsInt(0) + "x" +
dim.getElementAsInt(1) + " matrix.");
} else {
System.out.println("Result is an array with " +
dim.length() + " dimensions.");
}
}
}
Output:
Result is a 3x3 matrix.
For convenience, Renjin includes a wrapper class Matrix
that provides
easier access to the number of rows and columns.
Example:
// required import(s):
import org.renjin.primitives.matrix.*;
Vector res = (Vector)engine.eval("matrix(seq(9), nrow = 3)");
try {
Matrix m = new Matrix(res);
System.out.println("Result is a " + m.getNumRows() + "x"
+ m.getNumCols() + " matrix.");
} catch(IllegalArgumentException e) {
System.out.println("Result is not a matrix: " + e);
}
Output:
Result is a 3x3 matrix.
3.5.3.3. Dealing with lists and data frames¶
The ListVector
class contains several convenience methods to access
a list’s components from Java. For example, we can the extract the components
from a fitted linear model using the name of the element that contains those
components. For example:
ListVector model = (ListVector)engine.eval("x <- 1:10; y <- x*3; lm(y ~ x)");
Vector coefficients = model.getElementAsVector("coefficients");
// same result, but less convenient:
// int i = model.indexOfName("coefficients");
// Vector coefficients = (Vector)model.getElementAsSEXP(i);
System.out.println("intercept = " + coefficients.getElementAsDouble(0));
System.out.println("slope = " + coefficients.getElementAsDouble(1));
Output:
intercept = -4.4938668397781774E-15
slope = 3.0
3.5.4. Handling errors generated by the R code¶
Up to now we have been able to execute R code without any concern for possible errors that may occur when the R code is evaluated. There are two common exceptions that may be thrown by the R code:
ParseException
: an exception thrown by Renjin’s R parser due to a syntax error andEvalException
: an exception thrown by Renjin when the R code generates an error condition, for example by thestop()
function.
Here is an example which catches an exception from Renjin’s parser:
// required import(s):
import org.renjin.parser.ParseException;
try {
engine.eval("x <- 1 +/ 1");
} catch (ParseException e) {
System.out.println("R script parse error: " + e.getMessage());
}
Output:
R script parse error: Syntax error at line 1 char 0: syntax error, unexpected '/'
And here’s an example which catches an error condition thrown by the R interpreter:
// required import(s):
import org.renjin.eval.EvalException;
try {
engine.eval("stop(\"Hello world!\")");
} catch (EvalException e) {
// getCondition() returns the condition as an R list:
Vector condition = (Vector)e.getCondition();
// the first element of the string contains the actual error message:
String msg = condition.getElementAsString(0);
System.out.println("The R script threw an error: " + msg);
}
Output:
The R script threw an error: Hello world!
EvalException.getCondition()
is required to pull the condition
message from the R interpreter into Java.
3.5.5. Pushing data from Java to R¶
Like many dynamic languages, R scripts are evaluated in the context of an
environment that looks a lot like a dictionary. You can define new variables in
this environment using the javax.script
API. This is achieved using
the ScriptEngine.put()
method.
Example:
engine.put("x", 4);
engine.put("y", new double[] { 1d, 2d, 3d, 4d });
engine.put("z", new DoubleArrayVector(1,2,3,4,5));
engine.put("hashMap", new java.util.HashMap());
// some R magic to print all objects and their class with a for-loop:
engine.eval("for (obj in ls()) { " +
"cmd <- parse(text = paste('typeof(', obj, ')', sep = ''));" +
"cat('type of ', obj, ' is ', eval(cmd), '\\n', sep = '') }");
Output:
type of hashMap is externalptr
type of x is integer
type of y is double
type of z is double
Renjin will implicitly convert primitives, arrays of primitives and
String
instances to R objects. Java objects will be wrapped as R
externalptr
objects. The example also shows the use of the
DoubleArrayVector
constructor to create a double vector in R. You see
that we managed to put a Java java.util.HashMap
object into the
global environment of the R session: this is the topic of the chapter
Importing Java classes into R code.