Reputation: 426
I am trying to filter a huge txt file line by line, which pure R
is not so good at. So, I wrote a c function that hopefully can speed up the process. Below is a minimum working example of filter.c
, just for the demo purpose.
Currently, I have tried .C
to do the trick without luck. Here is my attempt.
filter.so
using gcc -shared -o lfilter.so -fPIC filter.c
dyn.load("lfilter.so")
.C("filter", as.character("I1.txt"), as.character("I1.out.txt"), as.character("filter.txt"))
R
crashed on me with 3rd step. But unfortunately, I have to stay within R.
Any help or suggestions are welcome.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LL 256
int get_row(char *filename)
{
char line[LL];
int i = 0;
FILE *stream = fopen(filename, "r");
while (fgets(line, LL, stream))
{
i++;
}
fclose(stream);
return i;
}
void filter(char *R1_in,
char *R1_out,
char *filter)
{
char R1_line[LL];
FILE *R1_stream = fopen(R1_in, "r");
FILE *R1_out_stream = fopen(R1_out,"w");
/*****************loading filters*******************/
int nrows = get_row(filter);
FILE *filter_stream = fopen(filter, "r");
char **filter_list = (char **)malloc(nrows * sizeof(*filter_list));
for(int i = 0; i <nrows; i++)
{
filter_list[i] = malloc(LL * sizeof(char));
fgets(filter_list[i], LL, filter_stream);
}
fclose(filter_stream);
/*****************filtering*******************/
while (fgets(R1_line, LL, R1_stream))
{
// printf("%s", R1_line);
for(int i = 0; i<nrows; i++)
{
if(strcmp(R1_line, filter_list[i])==0)
{
fprintf(R1_out_stream, "%s", R1_line);
break;
}
}
}
printf("\n");
for(int i=0; i<nrows; i++)
{
free(filter_list[i]);
}
free(filter_list);
fclose(R1_stream);
fclose(R1_out_stream);
}
// int main()
// {
// char R1_in[] = "I1.txt";
// char R1_out[] = "I1.out.txt";
//
// char filters[] = "filter.txt";
//
// filter(R1_in, R1_out, filters);
// return 0;
// }
aa
baddf
ca
daa
ca
cb
ca
Upvotes: 3
Views: 159
Reputation: 33601
I had never used R before. But, I was a bit intrigued. So, I installed R and did a little research.
Everything in R [using the .C
interface] is passed to the C function as a pointer.
From: https://www.r-bloggers.com/2014/02/three-ways-to-call-cc-from-r/ we have:
Inside a running R session, the .C interface allows objects to be directly accessed in an R session’s active memory. Thus, to write a compatible C function, all arguments must be pointers. No matter the nature of your function’s return value, it too must be handled using pointers. The C function you will write is effectively a subroutine.
So, if we pass an integer, the C function argument must be:
int *
I took a guess that:
char *
Needed to be:
char **
And, then tested it with:
#include <stdio.h>
#define SHOW(_sym) \
show(#_sym,_sym)
static void
show(const char *sym,char **ptr)
{
char *str;
printf("%s: ptr=%p",sym,ptr);
str = *ptr;
printf(" str=%p",str);
printf(" '%s'\n",str);
}
void
filter(char **R1_in,char **R1_out,char **filt)
{
SHOW(R1_in);
SHOW(R1_out);
SHOW(filt);
}
Here is the output:
> dyn.load("filter.so");
> .C("filter",
+ as.character("abc"),
+ as.character("def"),
+ as.character("ghi"))
R1_in: ptr=0x55a9f8cb1798 str=0x55a9f9de9760 'abc'
R1_out: ptr=0x55a9f8cb1818 str=0x55a9f9de9728 'def'
filt: ptr=0x55a9f8cb1898 str=0x55a9f9de96f0 'ghi'
[[1]]
[1] "abc"
[[2]]
[1] "def"
[[3]]
[1] "ghi"
> q()
So, you want:
void
filter(char **R1_in, char **R1_out, char **filt)
{
FILE *R1_stream = fopen(*R1_in, "r");
// ...
}
Upvotes: 2