Moh Moh
Moh Moh

Reputation: 11

How to Read content of different file(.txt, .pdf, .docx) Using ReactFileReader component in react js

I want to read the content of uploaded file(with different file extension, may be .txt, .docx, .pdf) in react js. My code is below. Now i am using ReactFileReader component. In my coding, it can read just content of txt file. It can't read content of pdf, docx. How to solve it. Please help me. Thank You.

import React, { Component } from "react";
import ReactFileReader from 'react-file-reader';
    
class DisplayController extends Component {
    constructor(props){
        super(props)
        this.state = {
            value: '',
            file : ""
        }
    }
    
    handleFiles = files => {
        let reader = new FileReader();
        
        reader.onload = function () {
            alert("Read Data : " + reader.result)
        }
        
        reader.readAsText(files[0])
    }
 
    render() {
        return (
            <form>
                <div className="files">
                    <ReactFileReader fileTypes={['.pdf','.txt','.docx']} handleFiles={this.handleFiles}>
                        <button className='btn'>Upload</button>
                    </ReactFileReader>
                </div>
            </form>
        )
    }
}
    
export default DisplayController;

Upvotes: 1

Views: 3530

Answers (1)

phry
phry

Reputation: 44086

It's very complicated, but not impossible, to read the contents of a docx file: That file is a .zip file that contains a lot of other files, which in return contain XML markup describing the file contents. But this is something that usually is not done in a browser as none of the tools required for that are shipped by default with browsers. You would probably need dozens of additional libraries to deal with that. Something like that should probably be done on a server.

It is however almost completely impossible to read the contents of a pdf. A pdf can take many forms and in the worst case it has not strings characters embedded, but glyphs or little images of characters, along with coordinates for every single character. Unless you know the exact tool that pdf was created and knew exactly how the file for that would look internally, it is not feasible to parse that into text. You could investigate to use a component to display the pdf to your user instead, if that matches your use case somethow. That should be possible.

Upvotes: 2

Related Questions