Alexander Engelhardt
Alexander Engelhardt

Reputation: 1712

UNIX shell-scripting: Split a textfile by its entries

I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:

20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000

I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th). How can I split this file into pieces according to the date?

The number of entries per date differs, so using split with a constant number won't work. Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?

Thanks in advance, Alex

Upvotes: 2

Views: 1719

Answers (3)

pixelbeat
pixelbeat

Reputation: 31708

With the caveats that each day needs to have more than 1 record, and that the output file will have blank lines:

uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'

We really should have an option to uniq to output even uniq records. Also csplit should have an option to suppress the matched line.

Upvotes: 0

l0b0
l0b0

Reputation: 58788

This should work if the items are in date sequence:

date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
    if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
    then
        echo "$line" >> "$date.dat"
    else
        let date++
    fi
done < log.dat

Upvotes: 1

dogbane
dogbane

Reputation: 274582

Using awk:

awk '{print  > "data"substr($1,0,8)".dat"}' myfile

Upvotes: 5

Related Questions