Reputation: 1946
Apologies for the duplicated (I think) question, new to C++ and have had a look around but still stuck!
I have found a bash script that takes a .docx file and outputs the plain text.
unzip -p filename.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'
This works great over bash.
Then to use this in my code:
FILE *fp = popen("unzip -p filename.docx word/document.xml | sed -e 's/<[^>]\\{1,\\}>//g; s/[^[:print:]]\\{1,\\}//g'", "r");
char buf[1024];
if (fp == NULL) {
cout << "Error";
}
while (fgets(buf, 1024, fp)) {
/* do something with buf */
cout << buf;
}
fclose(fp);
Nothing is printed as a result of this.
The code works with simple bash commands such as 'ls'
And help would be much appreciated!
Upvotes: 0
Views: 1376
Reputation: 1
(I assume your program should run on some Linux system, or at least some POSIX one)
You should at least use pclose
instead of fclose
and you should care about the exit code returned by pclose
.
As commented by Thab don't forget that \\
is an escape inside literal strings (the C++ compiler is lexing that as a single backslash in your string literal constant). You might use \\\\
or you could use C++11 raw string literals.
(you certainly should check, e.g. with your debugger, what is the string that popen
is processing)
BTW, perhaps that popen
failed and you did not catch that. Replace
if (fp == NULL) {
cout << "Error";
}
(missing std::endl
, so the output was not flushed)
with
if (fp == nullptr) {
close << "popen failed:" << strerror(errno) << std::endl;
exit(EXIT_FAILURE);
}
At last, I am not sure that this is the good approach to convert a .docx
to .txt
in batch mode on Linux. I would consider forking a Libreoffice or Openoffice process to do the job (perhaps libreoffice --headless --cat
and some more options). I don't know all the details, you'll need to RTFM.
BTW, you should probably code some small shell script to do the conversion, check and test it in the terminal, and call that shell script using popen
(hence avoiding a command line with backslashes).
Finally, your C++ code is too C-like. I would suggest using getline(3) so replacing
while (fgets(buf, 1024, fp)) {
/* do something with buf */
cout << buf;
}
with
char* linbuf = nullptr;
size_t linsiz = 0;
do {
ssize_t linlen = getline(&linbuf, &linsiz, fp);
if (linlen<=0) break;
cout << std::string(linbuf, linlen) << std::endl;
} while (!feof(fp));
free (linbuf), linbuf=nullptr;
Of course replace at least your fclose(fp);
with
int excod = pclose(fp);
if (excod != 0)
clog << "pclose failed " << excod << std::endl;
If you want to know more about the exit code, use waitpid(2) related macros on excod
(e.g. WIFEXITED
, WEXITSTATUS
, WIFSIGNALED
, WTERMSIG
etc....)
Don't forget to compile with all warnings & debug info (g++ -Wall -Wextra -g
) and to use the debugger (gdb
), strace(1), & valgrind
Do care about flushing your buffers (using std::flush, std::endl, fflush(3) etc....) when starting a process with fork(2) (or system(3) or popen(3) which are fork
-ing).
Upvotes: 5