字符串-C#删除多余空格的最快方法

将多余的空白替换为一个空白的最快方法是什么?
例如

foo      bar 

foo bar
25个解决方案
49 votes

最快的方法? 遍历字符串,并按字符逐个构建Regex中的第二个副本,每组空格仅复制一个空格。

较容易键入的Regex变体会产生大量的额外字符串(或浪费时间来构建正则表达式DFA)。

使用比较结果进行编辑:

使用n = 50的[http://ideone.com/NV6EzU,](必须减少ideone上的使用,因为花了很长时间他们不得不杀死我的进程),我得到:

正则表达式:7771ms。

Stringbuilder:894ms。

确实如我们所料,对于这种简单的操作,Regex的效率极低。

Blindy answered 2020-07-23T20:53:00Z
41 votes

您可以使用正则表达式:

static readonly Regex trimmer = new Regex(@"\s\s+");

s = trimmer.Replace(s, " ");

要获得更高的性能,请传递RegexOptions.Compiled

SLaks answered 2020-07-23T20:53:24Z
26 votes

有点晚了,但是我做了一些基准测试,以获取删除多余空白的最快方法。 如果有更快的答案,我希望添加它们。

结果:

  1. NormalizeWhiteSpaceForLoop:156毫秒(由我提供-关于删除所有空白的回答)
  2. NormalizeWhiteSpace:267毫秒(作者Alex K.)
  3. 正则表达式已编译:1950 ms(由SLags)
  4. 正则表达式:2261毫秒(由SLaks)

码:

public class RemoveExtraWhitespaces
{
    public static string WithRegex(string text)
    {
        return Regex.Replace(text, @"\s+", " ");
    }

    public static string WithRegexCompiled(Regex compiledRegex, string text)
    {
        return compiledRegex.Replace(text, " ");
    }

    public static string NormalizeWhiteSpace(string input)
    {
        if (string.IsNullOrEmpty(input))
            return string.Empty;

        int current = 0;
        char[] output = new char[input.Length];
        bool skipped = false;

        foreach (char c in input.ToCharArray())
        {
            if (char.IsWhiteSpace(c))
            {
                if (!skipped)
                {
                    if (current > 0)
                        output[current++] = ' ';

                    skipped = true;
                }
            }
            else
            {
                skipped = false;
                output[current++] = c;
            }
        }

        return new string(output, 0, current);
    }

    public static string NormalizeWhiteSpaceForLoop(string input)
    {
        int len = input.Length,
            index = 0,
            i = 0;
        var src = input.ToCharArray();
        bool skip = false;
        char ch;
        for (; i < len; i++)
        {
            ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    if (skip) continue;
                    src[index++] = ch;
                    skip = true;
                    continue;
                default:
                    skip = false;
                    src[index++] = ch;
                continue;
            }
        }

        return new string(src, 0, index);
    }
}

测试:

[TestFixture]
public class RemoveExtraWhitespacesTest
{
    private const string _text = "foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo ";
    private const string _expected = "foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo ";

    private const int _iterations = 10000;

    [Test]
    public void Regex()
    {
        var result = TimeAction("Regex", () => RemoveExtraWhitespaces.WithRegex(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void RegexCompiled()
    {
        var compiledRegex = new Regex(@"\s+", RegexOptions.Compiled);
        var result = TimeAction("RegexCompiled", () => RemoveExtraWhitespaces.WithRegexCompiled(compiledRegex, _text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpace()
    {
        var result = TimeAction("NormalizeWhiteSpace", () => RemoveExtraWhitespaces.NormalizeWhiteSpace(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpaceForLoop()
    {
        var result = TimeAction("NormalizeWhiteSpaceForLoop", () => RemoveExtraWhitespaces.NormalizeWhiteSpaceForLoop(_text));
        Assert.AreEqual(_expected, result);
    }

    public string TimeAction(string name, Func<string> func)
    {
        var timer = Stopwatch.StartNew();
        string result = string.Empty; ;
        for (int i = 0; i < _iterations; i++)
        {
            result = func();
        }

        timer.Stop();
        Console.WriteLine(string.Format("{0}: {1} ms", name, timer.ElapsedMilliseconds));
        return result;
    }
}
Stian Standahl answered 2020-07-23T20:54:14Z
12 votes

我使用以下方法-它们不仅处理所有空格字符,还修剪前导和尾随空格,删除多余的空格,并将所有空格替换为空格字符(因此我们有统一的空格分隔符)。 而且这些方法很快。

public static String CompactWhitespaces( String s )
{
    StringBuilder sb = new StringBuilder( s );

    CompactWhitespaces( sb );

    return sb.ToString();
}

public static void CompactWhitespaces( StringBuilder sb )
{
    if( sb.Length == 0 )
        return;

    // set [start] to first not-whitespace char or to sb.Length

    int start = 0;

    while( start < sb.Length )
    {
        if( Char.IsWhiteSpace( sb[ start ] ) )
            start++;
        else 
            break;
    }

    // if [sb] has only whitespaces, then return empty string

    if( start == sb.Length )
    {
        sb.Length = 0;
        return;
    }

    // set [end] to last not-whitespace char

    int end = sb.Length - 1;

    while( end >= 0 )
    {
        if( Char.IsWhiteSpace( sb[ end ] ) )
            end--;
        else 
            break;
    }

    // compact string

    int dest = 0;
    bool previousIsWhitespace = false;

    for( int i = start; i <= end; i++ )
    {
        if( Char.IsWhiteSpace( sb[ i ] ) )
        {
            if( !previousIsWhitespace )
            {
                previousIsWhitespace = true;
                sb[ dest ] = ' ';
                dest++;
            }
        }
        else
        {
            previousIsWhitespace = false;
            sb[ dest ] = sb[ i ];
            dest++;
        }
    }

    sb.Length = dest;
}
Sergey Povalyaev answered 2020-07-23T20:54:34Z
8 votes
string q = " Hello     how are   you           doing?";
string a = String.Join(" ", q.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
Detlef Kroll answered 2020-07-23T20:54:50Z
7 votes
string text = "foo       bar";
text = Regex.Replace(text, @"\s+", " ");
// text = "foo bar"

此解决方案适用于空格,制表符和换行符。 如果只需要空格,请将'\ s'替换为''。

Alec answered 2020-07-23T20:55:10Z
7 votes

对于较大的字符串,我需要其中之一,并在下面提出了例程。

任何连续的空格(包括制表符,换行符)都将替换为normalizeTo中的空格。前导/后缀空格已删除。

使用我的5k-> 5mil字符字符串,它比RegEx快8倍。

internal static string NormalizeWhiteSpace(string input, char normalizeTo = ' ')
{
    if (string.IsNullOrEmpty(input))
        return string.Empty;

    int current = 0;
    char[] output = new char[input.Length];
    bool skipped = false;

    foreach (char c in input.ToCharArray())
    {
        if (char.IsWhiteSpace(c))
        {
            if (!skipped)
            {
                if (current > 0)
                    output[current++] = normalizeTo;

                skipped = true;
            }
        }
        else
        {
            skipped = false;
            output[current++] = c;
        }
    }

    return new string(output, 0, skipped ? current - 1 : current);
}
Alex K. answered 2020-07-23T20:55:39Z
6 votes
string yourWord = "beep boop    baap beep   boop    baap             beep";

yourWord = yourWord .Replace("  ", " |").Replace("| ", "").Replace("|", "");
Abu Dina answered 2020-07-23T20:55:54Z
5 votes

我试过使用StringBuilder来:

  1. 删除多余的空白子字符串
  2. 接受字符,就像布林迪建议的那样,将字符从原始字符串循环

这是我发现的性能和可读性的最佳平衡(使用100,000次迭代计时运行)。 有时,此测试比不那么清晰的版本要快,最多要慢5%。 在我的小型测试字符串上,正则表达式需要4.24倍的时间。

public static string RemoveExtraWhitespace(string str)
    {
        var sb = new StringBuilder();
        var prevIsWhitespace = false;
        foreach (var ch in str)
        {
            var isWhitespace = char.IsWhiteSpace(ch);
            if (prevIsWhitespace && isWhitespace)
            {
                continue;
            }
            sb.Append(ch);
            prevIsWhitespace = isWhitespace;
        }
        return sb.ToString();
    }
TTT answered 2020-07-23T20:56:28Z
3 votes

速度不是很快,但是如果简单起了作用,则可以:

while (text.Contains("  ")) text=text.Replace("  ", " ");
user3029478 answered 2020-07-23T20:56:48Z
2 votes

试试这个:

System.Text.RegularExpressions.Regex.Replace(input, @"\s+", " ");
jaltiere answered 2020-07-23T20:57:08Z
2 votes

这个问题中有一些要求尚不明确,值得深思。

  1. 您是否要使用单个前导或尾随空白字符?
  2. 当您用单个字符替换所有空格时,是否要使该字符保持一致? (即,其中许多解决方案都将\ t \ t替换为\ t,将''替换为''。

这是一个非常有效的版本,它用一个空格替换了所有空格,并在for循环之前删除了所有前导和尾随空格。

  public static string WhiteSpaceToSingleSpaces(string input)
  {
    if (input.Length < 2) 
        return input;

    StringBuilder sb = new StringBuilder();

    input = input.Trim();
    char lastChar = input[0];
    bool lastCharWhiteSpace = false;

    for (int i = 1; i < input.Length; i++)
    {
        bool whiteSpace = char.IsWhiteSpace(input[i]);

        //Skip duplicate whitespace characters
        if (whiteSpace && lastCharWhiteSpace)
            continue;

        //Replace all whitespace with a single space.
        if (whiteSpace)
            sb.Append(' ');
        else
            sb.Append(input[i]);

        //Keep track of the last character's whitespace status
        lastCharWhiteSpace = whiteSpace;
    }

    return sb.ToString();
  }
Jake Drew answered 2020-07-23T20:57:41Z
2 votes

这段代码效果很好。 我没有衡量性能。

string text = "   hello    -  world,  here   we go  !!!    a  bc    ";
string.Join(" ", text.Split().Where(x => x != ""));
// Output
// "hello - world, here we go !!! a bc"
Christian Torrez answered 2020-07-23T20:58:01Z
1 votes

您可以使用indexOf首先获取空白序列的开始位置,然后使用replace方法将空白更改为“”。 从那里,您可以使用抓取的索引,并在该位置放置一个空格字符。

MGZero answered 2020-07-23T20:58:21Z
1 votes

这很有趣,但是在我的PC上,以下方法与Sergey Povalyaev的StringBulder方法一样快-(每200次重复约282ms,10k src字符串)。 虽然不确定内存使用情况。

string RemoveExtraWhiteSpace(string src, char[] wsChars){
   return string.Join(" ",src.Split(wsChars, StringSplitOptions.RemoveEmptyEntries));
}

显然,它适用于任何字符-不仅限于空格。

尽管这不是OP所要求的-但如果您真正需要的是仅用一个实例替换字符串中的特定连续字符,则可以使用这种相对有效的方法:

    string RemoveDuplicateChars(string src, char[] dupes){  
        var sd = (char[])dupes.Clone();  
        Array.Sort(sd);

        var res = new StringBuilder(src.Length);

        for(int i = 0; i<src.Length; i++){
            if( i==0 || src[i]!=src[i-1] || Array.BinarySearch(sd,src[i])<0){
                res.Append(src[i]); 
            }
        }
        return res.ToString();
    }
Zar Shardan answered 2020-07-23T20:58:50Z
1 votes
public string GetCorrectString(string IncorrectString)
    {
        string[] strarray = IncorrectString.Split(' ');
        var sb = new StringBuilder();
        foreach (var str in strarray)
        {
            if (str != string.Empty)
            {
                sb.Append(str).Append(' ');
            }
        }
        return sb.ToString().Trim();
    }
chandudab answered 2020-07-23T20:59:06Z
1 votes

我只是打了个电话,虽然还没有测试。 但是我觉得这很优雅,并且避免使用正则表达式:

    /// <summary>
    /// Removes extra white space.
    /// </summary>
    /// <param name="s">
    /// The string
    /// </param>
    /// <returns>
    /// The string, with only single white-space groupings. 
    /// </returns>
    public static string RemoveExtraWhiteSpace(this string s)
    {
        if (s.Length == 0)
        {
            return string.Empty;
        }

        var stringBuilder = new StringBuilder();
        var whiteSpaceCount = 0;
        foreach (var character in s)
        {
            if (char.IsWhiteSpace(character))
            {
                whiteSpaceCount++;
            }
            else
            {
                whiteSpaceCount = 0;
            }

            if (whiteSpaceCount > 1)
            {
                continue;
            }

            stringBuilder.Append(character);
        }

        return stringBuilder.ToString();
    }
Kris Coleman answered 2020-07-23T20:59:26Z
1 votes

我在这里想念什么吗? 我想出了这个:

// Input: "HELLO     BEAUTIFUL       WORLD!"
private string NormalizeWhitespace(string inputStr)
{
    // First split the string on the spaces but exclude the spaces themselves
    // Using the input string the length of the array will be 3. If the spaces
    // were not filtered out they would be included in the array
    var splitParts = inputStr.Split(' ').Where(x => x != "").ToArray();

   // Now iterate over the parts in the array and add them to the return
   // string. If the current part is not the last part, add a space after.
   for (int i = 0; i < splitParts.Count(); i++)
   {
        retVal += splitParts[i];
        if (i != splitParts.Count() - 1)
        {
            retVal += " ";
        }
   }
    return retVal;
}
// Would return "HELLO BEAUTIFUL WORLD!"

我知道我在这里创建第二个字符串以返回它以及创建splitParts数组。 只是觉得这很简单。 也许我没有考虑到一些潜在的情况。

Tom answered 2020-07-23T20:59:50Z
1 votes

我知道这确实很老,但是压缩空白的最简单方法(用单个“空格”字符替换所有重复出现的空白字符)如下:

    public static string CompactWhitespace(string astring)
    {
        if (!string.IsNullOrEmpty(astring))
        {
            bool found = false;
            StringBuilder buff = new StringBuilder();

            foreach (char chr in astring.Trim())
            {
                if (char.IsWhiteSpace(chr))
                {
                    if (found)
                    {
                        continue;
                    }

                    found = true;
                    buff.Append(' ');
                }
                else
                {
                    if (found)
                    {
                        found = false;
                    }

                    buff.Append(chr);
                }
            }

            return buff.ToString();
        }

        return string.Empty;
    }
Brien Halstead answered 2020-07-23T21:00:10Z
1 votes
public static string RemoveExtraSpaces(string input)
{
    input = input.Trim();
    string output = "";
    bool WasLastCharSpace = false;
    for (int i = 0; i < input.Length; i++)
    {
        if (input[i] == ' ' && WasLastCharSpace)
            continue;
        WasLastCharSpace = input[i] == ' ';
        output += input[i];
    }
    return output;
}
Saeed Taran answered 2020-07-23T21:00:26Z
1 votes

对于那些只想复制粘贴然后继续的人:

    private string RemoveExcessiveWhitespace(string value)
    {
        if (value == null) { return null; }

        var builder = new StringBuilder();
        var ignoreWhitespace = false;
        foreach (var c in value)
        {
            if (!ignoreWhitespace || c != ' ')
            {
                builder.Append(c);
            }
            ignoreWhitespace = c == ' ';
        }
        return builder.ToString();
    }
Mert Akcakaya answered 2020-07-23T21:00:46Z
1 votes

我对C#不太熟悉,因此我的代码不是一种优雅/最有效的代码。 我来这里是为了找到适合我的用例的答案,但是我找不到一个答案(或者我找不到一个答案)。

对于我的用例,我需要使用以下条件对所有空白(WS:{tabtabcr lf})进行规范化:

  • WS可以任意组合
  • 用最重要的WS替换WS序列
  • 在某些情况下,需要保留tab(例如,使用制表符分隔的文件,在这种情况下,还需要保留重复的制表符)。 但是在大多数情况下,它们必须转换为空间。

因此,这是一个示例输入和预期输出(免责声明:我的代码仅在此示例中经过测试)



        Every night    in my            dreams  I see you, I feel you
    That's how    I know you go on

Far across the  distance and            places between us   



You            have                 come                    to show you go on


转换成

Every night in my dreams I see you, I feel you
That's how I know you go on
Far across the distance and places between us
You have come to show you go on

这是我的代码

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string text)
    {
        bool preserveTabs = false;

        //[Step 1]: Clean up white spaces around the text
        text = text.Trim();
        //Console.Write("\nTrim\n======\n" + text);

        //[Step 2]: Reduce repeated spaces to single space. 
        text = Regex.Replace(text, @" +", " ");
        // Console.Write("\nNo repeated spaces\n======\n" + text);

        //[Step 3]: Hande Tab spaces. Tabs needs to treated with care because 
        //in some files tabs have special meaning (for eg Tab seperated files)
        if(preserveTabs)
        {
            text = Regex.Replace(text, @" *\t *", "\t");
        }
        else
        {
            text = Regex.Replace(text, @"[ \t]+", " ");
        }
        //Console.Write("\nTabs preserved\n======\n" + text);

        //[Step 4]: Reduce repeated new lines (and other white spaces around them)
                  //into a single new line.
        text = Regex.Replace(text, @"([\t ]*(\n)+[\t ]*)+", "\n");
        Console.Write("\nClean New Lines\n======\n" + text);    
    }
}

在此处查看此代码的实际运行情况:[https://dotnetfiddle.net/eupjIU]

BeNiza answered 2020-07-23T21:01:41Z
1 votes

我不知道这是否是最快的方法,但是我使用了它,这对我来说很有效:

    /// <summary>
    /// Remove all extra spaces and tabs between words in the specified string!
    /// </summary>
    /// <param name="str">The specified string.</param>
    public static string RemoveExtraSpaces(string str)
    {
        str = str.Trim();
        StringBuilder sb = new StringBuilder();
        bool space = false;
        foreach (char c in str)
        {
            if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
            else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
        }
        return sb.ToString();
    }
LL99 answered 2020-07-23T21:02:01Z
-1 votes

无需复杂的代码! 这是一个简单的代码,它将删除所有重复项:

public static String RemoveCharOccurence(String s, char[] remove)
{
    String s1 = s;
    foreach(char c in remove)
    {
        s1 = RemoveCharOccurence(s1, c);
    }

    return s1;
}

public static String RemoveCharOccurence(String s, char remove)
{
    StringBuilder sb = new StringBuilder(s.Length);

    Boolean removeNextIfMatch = false;
    foreach(char c in s)
    {
        if(c == remove)
        {
            if(removeNextIfMatch)
                continue;
            else
                removeNextIfMatch = true;
        }
        else
            removeNextIfMatch = false;

        sb.Append(c);
    }

    return sb.ToString();
}
Dima answered 2020-07-23T21:02:21Z
-1 votes

这非常简单,只需使用.Replace()120方法:

string words = "Hello     world!";
words = words.Replace("\\s+", " ");

输出>>>“ Hello world!”

Bradley Joel Pillay answered 2020-07-23T21:02:45Z
translate from https://stackoverflow.com:/questions/6442421/c-sharp-fastest-way-to-remove-extra-white-spaces