Reputation: 479
I have the next text file:
Article_Title 1st_Author Publication_Year Language Citations
Über die Theorie des Stoßes zwischen Atomen und elektrisch geladenen Teilchen Fermi 1924 German 54
Zur Quantelung des idealen einatomigen Gases Fermi 1926 German 333
Eine statistische Methode zur Bestimmung einiger Eigenschaften des Atoms und ihre
Anwendung auf die Theorie des periodi schen Systems der Elemente Fermi 1928 German 1833
Über die magnetischen Momente der Atomkerne Fermi 1929 German 795
Über das Intensitätsverhältnis der Dublettkomponenten der Alkalien Fermi 1929 German 134
Über den Ramaneffekt des Kohlendioxyds Fermi 1931 German 594
Quantum Theory of Radiation Fermi 1932 English 951
Zur Theorie der Hyperfeinstruktur Fermi 1933 German 280
Possible Production of Elements of Atomic Number Higher than 92 Fermi 1934 English 175
Versuch einer Theorie der β-Strahlen Fermi 1934 German 525
Sopra lo Spostamento per Pressione delle Righe Elevate delle Serie Spettrali Fermi 1934 Italian 901
Tentativo di una Teoria Dei Raggi β Fermi 1934 Italian 475
On the Absorption and the Diffusion of Slow Neutrons Almadi 1936 English 199
The Ionization Loss of Energy in Gases and in Condensed Materials Fermi 1940 English 710
The Capture of Negative Mesotrons in Matter Fermi 1947 English 1156
Interference Phenomena of Slow Neutrons Fermi 1947 English 301
On the Origin of the Cosmic Radiation Fermi 1949 English 3309
Are Mesons Elementary Particles? Fermi 1949 English 498
Angular Distribution of the Pions Produced in High Energy Nuclear Collisions Fermi 1951 English 324
Multiple Production of Pions in Nucleon-Nucleon Collisions at Cosmotron Energies Fermi 1953 English 118
The first Column is the Scientific Article name, the second one is the Last Name of the Author, the third one is the Publication year, the fourth one the article language and the fifth one is the number of citations ..
I would like to convert it into something like this:
1924 54
1926 333
1928 1833
1929 795
1929 134
1931 594
1932 951
1933 280
1934 175
1934 525
1934 901
1934 475
1936 199
1940 710
1947 1156
1947 301
1949 3309
1949 498
1951 324
1953 118
So, I need to remove the first column, the second and the fourth one
The Problem is the column of Article_Titles ... If the Article titles were like this:
I just need to run the next command:
sed -i '1,2d' plotting_data.txt # Removing First and second Line
awk '{$1=$2=$4=""; print $0}' plotting_data.txt > tmp && mv tmp plotting_data.txt # Removing First, Second and Fourth Column
The problem is that there are spaces between the words of the Article Titles .. I don't know how to tell awk or sed to remove that column .. could you help me?
I am using the next awk version:
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
compiled limits: max NF 32767 sprintf buffer 2040
and also the white space between fields is all blank chars
Upvotes: 1
Views: 119
Reputation: 203229
Assuming the white space in your sample is all blank chars this will work using any awk in any shell on any UNIX box:
$ awk 'NR==1{beg=index($0,$2)} NR>2{$0=substr($0,beg); print $2, $4}' file
1924 54
1926 333
1928 1833
1929 795
1929 134
1931 594
1932 951
1933 280
1934 175
1934 525
1934 901
1934 475
1936 199
1940 710
1947 1156
1947 301
1949 3309
1949 498
1951 324
1953 118
Upvotes: 5