Rajesh
Rajesh

Reputation: 35

Search for a string or number range in a file in HDFS

I want to search in HDFS and list out the files that contains my search string exactly, and my second requirement is that is there any possible way to search for a range of values in a file HDFS.

let suppose below is my file and it contains the following data

/user/hadoop/test.txt

101,abc
102,def
103,ghi
104,aaa
105,bbb

is there any possible way to search with the range [101-104] so that it returns the files which contains the following data range.

.

Upvotes: 0

Views: 2892

Answers (1)

sumitya
sumitya

Reputation: 2681

To display file names having a pattern. Lets loop through hdfs directory which has files let say.

hdfs_files=`hdfs dfs -ls /user/hadoop/|awk '{print $8}'`
for file in `echo $hdfs_files`;
 do
  patterns=`hdfs dfs -cat $file|egrep -o "10[1-4]"`
  patterns_count=`echo $patterns|tr ' ' "\n"|wc -l`
   if [ $patterns_count -eq 4 ]; then 
    echo $file;
   fi
 done

Now solution to second requirement "search for a range of values in a file HDFS" using shell command:-

hdfs dfs -cat /user/hadoop/test.txt|egrep "10[1-4]"

output:-

101,abc
102,def
103,ghi
104,aaa

or just match first column

hdfs dfs -cat /user/hadoop/test.txt|egrep -o "10[1-4]"

output:-

101
102
103
104

Upvotes: 1

Related Questions