For example, if I trim the animals data to 4 characters, here’s what I get: strtrim( x = animals, width = 4 ) # "cat" "dog" "kang" "whal" It doesn’t insert any whitespace characters to fill them out if the original string is shorter than the width argument. Note that the only thing that strtrim() does is chop off excess characters at the end of a string. When applied to the animals data, here’s what we get: strtrim( x = animals, width = 3 ) # "cat" "dog" "kan" "wha" It has two arguments: x is a vector containing the text to be shortened and width specifies the number of characters to keep. The strtrim() function can be used for this purpose. This is often useful when annotating figures, or when creating variable labels: it’s often very inconvenient to use the full name, so you want to shorten it to a short code for space reasons. It might be useful in some contexts to extract the first three letters of each word. For example, suppose that I have a vector that contains the names of several different animals: animals <- c( "cat", "dog", "kangaroo", "whale" ) The first task I want to talk about is how to shorten a character string. In this section I discuss only those tools that come as part of the base packages, but there are other possibilities out there: the stringr package provides a powerful alternative that is a lot more coherent than the basic tools, and is well worth looking into. However, because text data is quite rich, and generally not as well structured as numeric data, R provides a lot of additional tools that are quite specific to text. Some things you already know how to do: I’ve discussed the use of nchar() to calculate the number of characters in a string (Section 3.8.1), and a lot of the general purpose tools that I’ve discussed elsewhere (e.g., the = operator) have been applied to text data as well as to numeric data. Regardless of what the reason is, you’ll probably want to know a little bit about how to handle text in R. Or maybe you just need to rejig some of the text used to describe nominal scale variables. Maybe the raw data are actually taken from text sources (e.g., newspaper articles), or maybe your data set contains a lot of free responses to survey questions, in which people can write whatever text they like in response to some query. This can be for a lot of different reasons. Sometimes your data set is quite text heavy.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |