Reputation: 3
So I am following this book, machine learning hands on for developers written by Jason Bell. I got very far in this book until I had to connect my spring-xd streams to hadoop. I am running spring-xd 1.2.1, and I am running hadoop (1.2.1, and 2.6.0, I have tried both) which is on port 9000. In this tutorial we are supposed to take a twitter stream and pipe it to a file in hadoop, but when I created and deployed this stream the file it created was not getting populated with tweets. So now, to make things simplier I am now just trying to get the stream connected to hdfs by creating this stream,
stream create --name ticktock --definition "time | hdfs" --deploy
which should be piping the date to a file in /xd/ticktock/ticktock-0.txt.tmp, however, when I try to use the command
hadoop fs cat /xd/ticktock/ticktock-0.txt.tmp
it produces nothing leaving me to assume that there is no data reaching it. I did place a tap on this stream, and ran it to a local file. In that file it was recording the times correctly, so I know that my stream is doing the correct function and producing an output, it's just not reaching hadoop for some reason.
It will create the file in hadoop, so it's not like hadoop is completely ignoring the stream, theres just nothing inside the file that it creates for it.
I did find someone who was having the same problem as me and they their vm networking to NET or something, but I am not using a vmbox.
I have tried chmoding the folder xd to 777, I have made sure that I can ssh to my local machine without a password, I have made sure that there is a data node running in my hadoop cluster, and I have made sure that the function cat works by placing a file that I created into my hdfs then running the cat command on it from both within spring-xd shell and from a regular terminal.
I unfortunately am at a loss, could someone help me out in this scenario?
If you need any information about my hadoop cluster or spring-xd setup let me know what, I am still a newby with these technologies.
Upvotes: 0
Views: 258
Reputation: 3
Okay I fixed it, for some reason I re-read that error message and saw that there was no datanodes running again. I restarted haoop but this time in 2.6.0 then ran that test stream for a couple of seconds and then destroyed it. Sure enough that did the tick. Thanks Satish Srinivasan, I had no idea the stream had to be deleted before read.
Upvotes: 0
Reputation: 36
You can see the files in hdfs sink once you destroy the stream.
2.Also, Rollover: Even when the stream is alive, once the stored data size exceeds 1G(default value), Spring XD will rollover the 1G content to an HDFS file and create a new tmp file and store the current timetock values in it.
Thanks S.Satish
Upvotes: 0