yuw444
yuw444

Reputation: 426

Using `.C` Interface of R to handle read/write files

I am trying to filter a huge txt file line by line, which pure R is not so good at. So, I wrote a c function that hopefully can speed up the process. Below is a minimum working example of filter.c, just for the demo purpose.

Currently, I have tried .C to do the trick without luck. Here is my attempt.

  1. built filter.so using gcc -shared -o lfilter.so -fPIC filter.c
  2. dyn.load("lfilter.so")
  3. .C("filter", as.character("I1.txt"), as.character("I1.out.txt"), as.character("filter.txt"))

R crashed on me with 3rd step. But unfortunately, I have to stay within R.

Any help or suggestions are welcome.

filter.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LL 256

int get_row(char *filename)
{
  char line[LL];
  int i = 0;
  FILE *stream = fopen(filename, "r");
  while (fgets(line, LL, stream))
  {
    i++;
  }
  fclose(stream);
  return i;
}


void filter(char *R1_in,
            char *R1_out,
            char *filter)
{
  char R1_line[LL];
  
  FILE *R1_stream = fopen(R1_in, "r");
  FILE *R1_out_stream = fopen(R1_out,"w");
 
  /*****************loading filters*******************/
  int nrows = get_row(filter);
  
  FILE *filter_stream = fopen(filter, "r");
  
  char **filter_list = (char **)malloc(nrows * sizeof(*filter_list));
  for(int i = 0; i <nrows; i++)
  {
    filter_list[i] = malloc(LL * sizeof(char));
    fgets(filter_list[i], LL, filter_stream);
  }
  
  fclose(filter_stream);
  
  /*****************filtering*******************/
  
  while (fgets(R1_line, LL, R1_stream))
  {
    // printf("%s", R1_line);
    
    for(int i = 0; i<nrows; i++)
    {
      if(strcmp(R1_line, filter_list[i])==0)
      {
        fprintf(R1_out_stream, "%s", R1_line);
        break;
      } 
    }
  }
  printf("\n");
  
  for(int i=0; i<nrows; i++)
  {
    free(filter_list[i]);
  }
  free(filter_list);
  
  fclose(R1_stream);
  fclose(R1_out_stream);
  
}

// int main()
// {
//   char R1_in[] = "I1.txt";
//   char R1_out[] = "I1.out.txt";
// 
//   char filters[] = "filter.txt";
// 
//   filter(R1_in, R1_out, filters);
//   return 0;
// }

I1.txt

aa
baddf
ca
daa

filter.txt

ca
cb

Expected Output I1.out.txt

ca

Upvotes: 3

Views: 159

Answers (1)

Craig Estey
Craig Estey

Reputation: 33601

I had never used R before. But, I was a bit intrigued. So, I installed R and did a little research.

Everything in R [using the .C interface] is passed to the C function as a pointer.

From: https://www.r-bloggers.com/2014/02/three-ways-to-call-cc-from-r/ we have:

Inside a running R session, the .C interface allows objects to be directly accessed in an R session’s active memory. Thus, to write a compatible C function, all arguments must be pointers. No matter the nature of your function’s return value, it too must be handled using pointers. The C function you will write is effectively a subroutine.

So, if we pass an integer, the C function argument must be:

int *

I took a guess that:

char *

Needed to be:

char **

And, then tested it with:

#include <stdio.h>

#define SHOW(_sym) \
    show(#_sym,_sym)

static void
show(const char *sym,char **ptr)
{
    char *str;

    printf("%s: ptr=%p",sym,ptr);

    str = *ptr;
    printf(" str=%p",str);

    printf(" '%s'\n",str);
}

void
filter(char **R1_in,char **R1_out,char **filt)
{

    SHOW(R1_in);
    SHOW(R1_out);
    SHOW(filt);
}

Here is the output:

> dyn.load("filter.so");
> .C("filter",
+   as.character("abc"),
+   as.character("def"),
+   as.character("ghi"))
R1_in: ptr=0x55a9f8cb1798 str=0x55a9f9de9760 'abc'
R1_out: ptr=0x55a9f8cb1818 str=0x55a9f9de9728 'def'
filt: ptr=0x55a9f8cb1898 str=0x55a9f9de96f0 'ghi'
[[1]]
[1] "abc"

[[2]]
[1] "def"

[[3]]
[1] "ghi"

> q()

So, you want:

void
filter(char **R1_in, char **R1_out, char **filt)
{

    FILE *R1_stream = fopen(*R1_in, "r");

    // ...
}

Upvotes: 2

Related Questions