Reputation: 53
I'm quite inexperienced with shell/ mac terminal so any help or advice would be greatly appreciated.
I have a very large set of data with a tab delimiter. Here is an example of what the code looks like.
0001 User1 Tweet1
0002 User2 Tweet2
0003 User3 Tweet3
0004 User2 Tweet4
0005 User2 Tweet5
I've been trying export as a csv a list of each unique user and how many times they appear/make a tweet.
Here's my current attempt at the code:
cut -f 2 Twitter_Data_1 |sort | uniq -c | wc -l > TweetFreq.csv
Ideally I wish to export a csv that look like:
User1 1
User2 3
User3 1
Upvotes: 0
Views: 42
Reputation: 2410
Not the cleanest but it works
#!/bin/bash
mkdir tmptweet # Creation of the temp directory
while read line; do
user=`echo $line | cut -d " " -f 2` # we access the username
echo $line >> tmptweet/$user # add a line to the selected user's counter
done < Twitter_Data_1
for file in tmptweet/*; do
i=`cat $file | wc -l` # we check the lines for each user ...
echo "${file##*/} $i" >> TweetFreq.csv # ... and put this into the final file
done
rm -rf tmptweet # remove of the temp directory
A temporary directory with temporary files are used to stores values, easier than juggling with Array
.
Each line of your Twitter_Data_1 is inserted into a file named after the username, then one counts the number of line in each of those files to create the TweetFreq.csv
file
Test :
Will /home/will # ls
script.sh Twitter_Data_1
Will /home/will # ./script.sh
Will /home/will # ls
script.sh Twitter_Data_1 TweetFreq.csv
Will /home/will # cat TweetFreq.csv
User1 1
User2 3
User3 1
Will /home/will #
Upvotes: 0
Reputation: 634
$ awk -F '\t' '{ print $2 }' tweet | sort | uniq -c
Output:
1 User1
3 User2
1 User3
Upvotes: 2