This week for Reading and Writing electronic set we were asked to do something creative with command line text manipulation. As a source material I started working with two thesis papers I wrote in college. One was for a Transportation Geography class that dealt with the impact of mobile phones on transit preferences and the other was for a class on Public Finance where I had written about GPS (GNSS) systems and the economic structure behind them. I’ve done a lot of work related to transportation and to a certain extend behavioral economics while at ITP so I thought going back and working with these half remembered papers would be interesting.
First I converted the files from their “.Docx” format to “.txt” to use with terminal. This worked but resulted in really long lines. Apparently in this instance the lines were broken where there had been paragraph breaks. I used the cut command to try and break up these lines into words or even sentences but had a hard time with that. Eventually I used Fold to force the paragraphs into lines 80 units long. Fold had broken the paragraphs at odd points so there were fractions of words at the beginning and end of each line. I kept playing around with the cut command and figured out that I could take the second word off of each line and come up with a list of full words that were pseudo randomly selected from each paper. I though comparing a random selection of words from each paper could be interesting to see how they differ. I guess I was curious if the topic or my writing style would stand out under this comparison. I took both lists, sorted them and pasted the two columns next to each other in a separate file. Alone this wasn’t very interesting but when I ran different grep searches for words it would sometimes yield some interesting stuff.
I thought this was an OK application of terminal but I felt like I could have gotten more out of this if I hadn’t spent so much time struggling with the weird line lengths. I would have preferred being able to pair words from both texts in a more complete and contextual way. However I was happy that I figured out how to compress most of this process into a couple lines in terminal. Mostly it was:
fold <CapstoneEcon.txt | cut -d ‘ ‘ -f 2 | sort >AlpListwordsEcon.txt
fold <geogcap.txt | cut -d ‘ ‘ -f 2 | sort >GeogFoldSort.txt
paste AlpListwordsEcon.txt GeogFoldSort.txt >CombinedEconGeog.txt