Liam
Liam

Reputation: 53

Finding frequency of each item in a column using shell

I'm quite inexperienced with shell/ mac terminal so any help or advice would be greatly appreciated.

I have a very large set of data with a tab delimiter. Here is an example of what the code looks like.

0001    User1    Tweet1
0002    User2    Tweet2
0003    User3    Tweet3
0004    User2    Tweet4
0005    User2    Tweet5

I've been trying export as a csv a list of each unique user and how many times they appear/make a tweet.

Here's my current attempt at the code:

cut -f 2 Twitter_Data_1 |sort | uniq -c | wc -l > TweetFreq.csv

Ideally I wish to export a csv that look like:

User1    1
User2    3
User3    1

Upvotes: 0

Views: 42

Answers (2)

Will
Will

Reputation: 2410

Not the cleanest but it works

#!/bin/bash
mkdir tmptweet # Creation of the temp directory
while read line; do
user=`echo $line | cut -d " " -f 2` # we access the username
echo $line >> tmptweet/$user # add a line to the selected user's counter
done < Twitter_Data_1

for file in tmptweet/*; do
i=`cat $file | wc -l` # we check the lines for each user ...
echo "${file##*/} $i" >> TweetFreq.csv # ... and put this into the final file
done
rm -rf tmptweet # remove of the temp directory

A temporary directory with temporary files are used to stores values, easier than juggling with Array.

Each line of your Twitter_Data_1 is inserted into a file named after the username, then one counts the number of line in each of those files to create the TweetFreq.csv file

Test :

Will /home/will # ls
script.sh     Twitter_Data_1
Will /home/will # ./script.sh
Will /home/will # ls
script.sh     Twitter_Data_1     TweetFreq.csv
Will /home/will # cat TweetFreq.csv
User1        1
User2        3
User3        1
Will /home/will #

Upvotes: 0

mathB
mathB

Reputation: 634

$ awk -F '\t' '{ print $2 }' tweet | sort | uniq -c

Output:

  1 User1
  3 User2
  1 User3

Upvotes: 2

Related Questions