![]() ![]() Download a movie subtitle file from, e.g.Shell learning: Keep in mind all the punctuation characters that have special meaning in the shell (there are lots of them) and always choose the correct type of quotes (or backslash) to neuter that meaning when this is the desired behaviour.Īlso don’t continue with \ the last part of a command.Ī similar, potentially useful script for extracting entire lines that begin with a letter or symbol, in their original order, and excluding lines that start with digits: #!/bin/bashĪs an additional foreign language study use-case: Will give me the details of that file, and no doubt some junk temporary file created years ago and inadvertently not cleaned up by me. The arguments to tr should each be enclosed in single quotes, since otherwise the shell will treat those character matches to tr as things for potential expansion by the shell itself - and hence your command will malfunction randomly depending on your current directory, the files therein, and your shell ‘glob’ settings.Īs an illustration, my current directory contains a file called t i.e. … Dave, why are you writing gibberish with you forehead again so many times?” ). I can not let you continue until you have calmed down. You have used verbs so many times, I think you are angry. Some rudimentary algorithms use the number of certain words to determine the writers mood as well as if the text is happy/angry/etc. Case could be used to identify names and possibly create a draft of a meta-information keywords - repeat to few texts and then use some network-visualization / mindmapper to create a map of how everything connects (could do that to foreign words too, I suppose). Listing most uncommon words might reveal also something about the text content and meaning. You could also analyze a list of passwords (characters) and passphrases with similar technique to find out what not use / how to target attack (depending what side you are on). My thought pattern led me to “text fingerprinting” where characteristics of text (including choice of expressions and synonyms, frequency of “and” etc., common spelling errors and so on) are used to identify who wrote what and possibly identify a person (to a certain statistical accuracy or inaccuracy). “Laziness” is one of the most powerful forces in humanity (hey, it’s why computers were built) It’s good that you thought of something positive to use it on. (See uniq -help.) I’ve added this to the OP. I could have used the -i operator with uniq to make the script ignore case. And Anki flatpak works on the Librem 5, so you can study on-the-go.Įxamples, based on several paragraphs of an italian text I copied from a website: You can even import the generated lists into the Anki e-flashcard application so you can easily learn and review. If, for example, you’re studying a foreign language, you can use this script to personalize a vocabulary list for a particular article, piece of literature, professional or technical document, etc., so as to greatly focus your study, and with source material selected by you. You could add an initial operation to auto-download the source text from a website, but personally I would prefer to find the exact block of text I want, then copy and paste it myself. Sort -rn | head -n 100 | cat > word_freq_out.txt \ # Most Frequently Used Words List change "100" to a different number if desired. Sort -dictionary-order | cat > word_list_out.txt \ # Either of the two operations below can be disabled with a "#" at the start of each line. File will be created automatically if not present. *word_freq_out.txt* any existing contents will be overwritten. # For a sorted list of only the 100 (etc.) of the most frequently used words in the source file, fill in file destination of extracted word list, e.g. *word_list_out.txt* any existing contents will be overwritten. # For Unique Word List, fill in file destination of extracted word list, e.g. # Fill in name of file that contains the source text, e.g. */home/username/Desktop/filename.txt*, etc. # Assumes file containing text to be parsed is located in home directory change path if desired, e.g. ![]() # Extracts all unique words from a text and creates an alphabetical list of all words used, and a separate frequency list of the 100 most used words in the source file. wordlist.sh, and copy a source text to a file and save it. ![]() Count all the unique words in the file, determine their frequency of use, and save the top 100 (or a higher or lower number) in a list sorted by frequency.Extract all unique words contained in a text file, list them alphabetically, and save as a separate file.This script, which I found online and then modified slightly, consists of two operations: ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |